Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Monolith to Microservices using Kubernetes at Guidewire image

Monolith to Microservices using Kubernetes at Guidewire

Kubernetes Bytes
Avatar
1.1k Plays13 days ago

In this episode of the Kubernetes Bytes podcast, Ryan and Bhavin sit down with Diego Devalle and Anoop Gopalakrishnan from Guidewire to talk about how they went through an application modernization journey and adopted Kubernetes and cloud over the last 5 years.   Diego and Anoop share their experiences around how they drove this modernization inside Guidewire by both championing organizational change, and introducing Kubernetes and cloud technologies, while at the same time ensuring that they serve their existing customers in the insurance industry.   

Check out our website at https://kubernetesbytes.com/ 

Show links:  

Timestamps: 

Recommended
Transcript

Introduction to Kubernetes Bytes

00:00:03
Speaker
You are listening to Kubernetes Bytes, a podcast bringing you the latest from the world of cloud native data management. My name is Ryan Wallner, and I'm joined by Babin Shah coming to you from Boston, Massachusetts. We'll be sharing our thoughts on recent cloud native news and talking to industry experts about their experiences and challenges managing the wealth of data in today's cloud-native ecosystem. Good morning, good afternoon, and good evening, wherever you are. We're coming to you from Boston, Massachusetts. Today is November 1st, Friday. Hope everyone is doing well and staying safe.

November Weather and Lifestyle Discussion

00:00:41
Speaker
How's it going, Bhavan? Busy. Busy. Yeah,. Not doing shitty work and being busy. But good work and being busy. In other news, it was 80 degrees yesterday. We have to talk about the weather in New England when it's November. I'm okay with with it. it. Like I I was was just just like, like, yesterday uh, yesterday we we went went for for a a walk, walk, me uh, me and and my my wife, wife, uh, to to Trader Trader Joe's. Joe's and And we we're were like, like, yeah, yeah, I'm I'm happy happy with this. And we're like global warming and warmer weathers in November. I was like, shit, there has to be bigger consequences. Like the, the, the day of 70 plus degrees that I get in November is not worth everything else that might be messed up with the ecosystem. Forget the weather. You just said you can walk to Trader Joe's. I'm pretty jealous about that. Oh, yeah. That's the summer thing that we do. Grocery shopping from spring when the temperature is above 60 to fall. I mean, I guess that's what I get for living out here versus where you are. If I walk in other directions, that's Whole Foods. I'm just selling the town of Arlington at this point. Nice. Yeah, I was actually pretty excited because yesterday was Halloween. Oh, yeah. Did you dress up? I was a Dalmatian with my daughter. Oh, wow. Okay. I didn't go full out, but I was a Dalmatian. That's so cool. Yeah. It's the weirdest thing happened to me trick-or-treating yesterday. We were walking through this neighborhood, totally normal neighborhood, kids out trick-or-treating, and then there's a mini horse. Someone had a mini horse they were walking around with. And I was like, you don't see that every Halloween, do you? Like an actual mini horse? An actual horse, mini horse, hoofing along, leaving presents on the sidewalk, if you know what I mean. Like, I remember Parks and Rec and the whole deal they had with little Sebastian. I don't know if you saw that series. I don't think it was part of the get-up either. Like, I didn't pay attention. Maybe someone was a cowboy or something. I'm not sure. You still have to rent or do something to get a mini horse. Maybe it was their pet. I have no idea. Because people have mini horses. If you have a pet mini horse listener, I want to know about it. I want to know how common this is. I didn't think, I didn't expect to see a mini horse near my home. How much time do you spend explaining people that it's not a pony, not a kid horse, it's just a mini horse. Or answering can I ride it questions. That's my daughter. She was like, I want to go say I can ride it. I was like, I don't think it's like that. Okay. This is not a petting zoo or a ranch where you can ride. Nice. I've seen on Instagram like mini cows. Like people are breeding somehow. Yeah. Like they don't grow to like the full size cows, but small enough. I want one. How small are we talking? Dog size? Yeah. Dog size. Like golden retriever or great day somewhere in between. So like not too big, not too small. This, I have to do more research. Yeah, mini cows. I don't know if they give you milk or anything, but it's just like a pet. I know you were expecting Kubernetes talk, listener, but we're talking about mini horses and mini cows. Just the way it is. Just the way it is. Okay, okay. Let's switch. I think we

KubeCon Mention and Guest Introduction

00:04:05
Speaker
can stop. Let's get back on track for our listeners. I mean, unless you're, you know, just send us a message on Slack if you want to talk more about the other topics of today. Maybe you're thankful for this. Maybe you're not, but we're going to skip the news today. And that is because KubeCon is coming up and we blast you with news episodes because if you weren't there, a lot usually happens with press releases and all sorts of stuff. So expect some what happened in the news episodes post-KubeCon. But we will limit our news for a few episodes to save everybody from news. Too much news of i was thinking of a better word for that this time i'm not going to kubecon so yeah i'll have to get a sneak peek of everything that's happening from from your perspective and then we'll do the whole news episode a combo maybe i think it works better that way because that's what happened with one other coupon i didn't know but you answered that one yes uh anyway so um we do have some really awesome guests um um anoop and diego from guidewire um i will let them introduce themselves but i'm expecting a fun conversation so without further ado no more mini horses and mini cows let's get anoop and diego on the show hello and welcome to k Spites, Diego and Anup. It's so good to have you here. Why don't you both give a little introduction to our listeners about who you are and what you do. Why don't you go first, Diego? Thank you. Thank you. So nice to be here. Excited about the conversation. So my name is Diego De Valle. I work at Guidewire. I run engineering. I run operation and support. And I run product management for the platform. And when we talk about platform, we talk about the infrastructure platform, what we call what mainly predominantly big portion of the conversation today will be, and application platform that is our fabric to build the application that runs on top of the infrastructure. Fantastic. Operations and support. So now the espresso makes sense, I feel like. Anu, what about you? Yeah, first of all, thanks a lot for inviting us for the show. I've been a listener of your podcast for some time now and loved most of the topics that you guys have been discussing. And to be honest, it's been a vindication of what we have been doing here at Guyver, firstly, because most of the things that you bring on topic are the things that we've either been doing or looking at for our immediate next steps. But with that, let me start my introduction. My name is Anup Gopalakrishnan I'm a VP of engineering here I work majorly in the platform engineering space which is which is what we built out Diego and I joined Guidewire in 2018 and that's one of the major things that we've been building out since then okay no Anup thank you for the kind words right it's always like I know you had reached out to us before as well, sharing, giving kudos. So we really appreciate the feedback. And it's good that we are keeping on track as organizations are going through this journey. So before we dive into like Guidewire and all the awesome work that you guys have done since 2018, Diego, can you give us an overview of who Guidewire is, what does it do, and what are some of its customers before we talk about the tech?

Guidewire's Cloud Transition Journey

00:07:27
Speaker
Sure. So Guidewire was funded like 2001, 2002. You know, it's always difficult to kind of pinpoint. It's easy to pinpoint, but there is a moment that you're in stealth mode, then you kind of become and so on. So it's kind of 20 plus years of existence. The goal from the very beginning was to build a solution and a platform for PNC insurance. So imagine the State Farm of the world, imagine Allstate of the world, imagine Geico of the world, that they typically run a business that is a combination of offering you a policy for car, for home, and then also helping you out on claims and so on. So the company started with claims. They built on top of that policy. Then they attached billing, and then they expanded into analytics and some offer around analytics. They started, as I said, early 2000. So as you could imagine, the solution was not a cloud solution in any shape or form, and it didn't even exist. It was a self-managed solution. It's always difficult to really use standard terminology with Guidewire, but imagine a platform as a service self-managed that I don't know if it exists, right? But it was really a platform that was used to build the application. But because every insurer ultimately kind of needed to model their product and their capability in their own way. So imagine that you give a platform, a product that is 70% there, and then every customer was doing the last 30%. And that 30% was defining the difference between how carrier one and carrier two differentiate themselves. A world in which typically a tier one carrier will end up having a lot of developer on top of the guide to our platform. And so that's what the solution is. And then in 2018, when they hire me, then the beginning of the journey was, we want to move this to the cloud. How do we do that? Right, right. Now, interesting. I'm assuming if you have some sort of analysis for investment in the infrastructure or the sales team is asking for more money to Geico and State Farm, you guys look out for the Super Bowl ads, see who has the most money to spend. I'm just kidding, sorry. No, no, typically, typically, you know, it's like, this is a B2B business in which like, you know, nobody knows that the guidebook exists. And it's one of those typical things that when you are in any kind of setting and you're saying, what do you do? And they say, I work for company X. And they say, oh, I didn't know. What do you do? It's very different than if you say, I work for State Farm and somebody will say, oh, State Farm, I know. I mean, just because all the commercial and all the things. So we do the work that makes everything working on top, but we're kind of hidden from the public. That's one key aspect, right? The fact that, I don't know the number, Diego, you might know the top one third of insurance companies are our customers. And so majority of the policies underwritten or claims are going through our platform, but very few people actually know the existence of that platform or the company behind it. Gotcha. Yes. I think that's of a key importance right like the platform plays a key role a major role especially what we are seeing in Florida with a couple of hurricanes and North Carolina there are so many insurance claims for making sure everybody
00:11:08
Speaker
so now not not to go too much into the detail of which customer and so on but because ah some of our customer had of course a lot of a lot of each a lot ofurd people that were filing claims ah when those major things happen then of course you see ah workload and a load into the system that increased ah gets money. These platforms we were asked to sort of ah delay a platform update ah because of the concern that the update could create and so on and so forth right so
00:11:41
Speaker
start to become like, and when we were not a cloud solution, you basically ship a CD between quote and quote, and then you are off the hook. And then across those years, we sort of became more central into the world. And so when there are those situations, we feel it in a completely different way because now we run a mission critical system for the portion of the customer that is switched to our platform, to the cloud solution. And not all of them are there. There is a journey and a portion of that did that. Makes sense. Makes sense. You know, it sounds like historically you've been sort of a quote unquote guidewire for these various companies. And when you joined in 2018, you needed to guide yourself towards this new journey of moving to the cloud. And I guess my question is, how do you act as in confidence to kind of go through this modernization journey and also serve those customers at the same time? What were some of the challenges you saw when you joined during that timeframe? Yeah, so first of all, I think in this situation, you needed to make a couple of clear decisions earlier on and know what you want to do. So for us, the decision number one was we're going to build this on top of AWS. And with all the pros and cons, but you cannot drag that decision too long. So AWS was decision one. Decision two was we are not building a new solution, cloud native, kind of a Roku style, but we are moving what we have on the cloud. And with that move, we move the customer together and we offer a pattern, a path into which we move you. And then we're going to modernize around you. That was a strategic decision number two. With strategic decision number two, also because imagine by then we had like 18 years of existence. As I said, a lot of our customers brought a lot of code on top of the platform. The value proposition could not be you go to a customer and saying, hey, I built this thing new. You need to reimplement. Beside the fact that the first question will be, does it have functional parity? And you don't make functional parity on a new solution within a year after you're building for 18, right? So the strategic decision was we needed to make sure that we move this. And so then with that decision, automatically three minutes later, looking at the option that we had to do that, we fought like container and Kubernetes. And so the sort of my next call was to Anup. Anup and I have been working together before this in a different company for a while. And then Anup went into a stint of, how to say, leveling up his expertise on some platform aspect that he's going to talk about, I'm sure. And so then our pattern reunite, I convinced him to join me into this crazy, into this crazy thing. So this was a kind of, so these were the, let's say two or three steps, right? And then the step number five was that when, when you're asked to do something like this, there is never a magic investment that says, and here there is a new team to do this is you got what you got, right? So the, the next key question was we need need to, to decide we everything that we stopped doing it, because we needed to build and create a platform team that is going to build this infrastructure to run this opinionated flavor of Guidewire. And how do we do that, right? So that's kind of that was the beginning of the journey. Yeah. And 2018 was certainly a sort of turning point in the market as well. You know, considering you were choosing cloud platform, you were kind of thinking Kubernetes beyond 2018. And, you know, I think 2017 Mesosphere was still like a leading, you know, orchestrator for Kubernetes at the time. So generally pretty early, even though we're not that many years past when you started thinking about that. I do want to ask one follow up, which is, you know, was there a decision tree behind why you went with AWS over other hyperscalers? The decision was twofold. There was a portion of the decision that was a business decision, as easy as that. Sure. And then there was a secondary portion that was the sort of maturity and also how much maturity across the world because we have different customers in different places. And so there was a, you know, you need to think in terms of an insurance is similar to a bank. It's like, it's very, their appetite to risk is close to none. And so we needed to pick a provider that was the most acceptable between quote unquote from a carrier perspective and the combination of maturity and so on where Azure was and so on. Google was not an option. The only two options was eventually back then Azure and Amazon and Amazon was better in many respects. So that's what's the story. Back in 2018, I don't think it was much of a competition at that point. No, no, no. But because we were also looking where the ball was getting, right? You don't look where the ball is, you look where the pass is going to be. And so we looked at a few things. We looked at also the maturity of AWS at the beginning to support us in this journey to a point that, again, Anup will go into, but we had an ambition, Anup had a dream to kind of, not a dream, but he had an appetite and it's good to always have a sort of bounce back idea to each other. But Anup wanted to go even more core directly to Kubernetes and not EKS. And then we said, no, no, let's use as much as we can AWS because we were at the beginning of our journey ourselves and we had almost zero knowledge in the company. So that is also another aspect. Yeah. Gotcha. So Anup, let's talk about this dream a bit more. I'm just kidding. I want, since you, you lead the engineering team at Guidewire, right? Can you talk about like the, the responsibilities that come in? I know platform, you listed a couple already, but how did you like go through this and what did the technology stack look like before the transformation, before you guys decided to move to AWS and Kubernetes?

Technical and Operational Challenges

00:18:33
Speaker
Yeah, so before we decided to move to Kubernetes, it wasn't as linear as we might think, right? We had ups and downs. We tried this, we tried that, and then multiple failures, and then obviously a path fit in. But before we started, the first thing to realize is how things work. We had an engineering team that was building the application platform as a product. And then we had an operations team which interfaced with customers and would use the application platform that the customers would then install on their data centers. Or we had a nascent cloud practice, which was about running those things in VMs in AWS. So we had a little bit of familiarity, but not too much. And that was happening. So what was broken in between that was the feedback loop, right? Because we would ship something once in two years, Diego? Yeah, so we were shipping every two years. But again, but imagine a stack is the usual stack. Three database. We were supporting two database. We were supporting four application server. But more important, just to connect the dots, our customer were building their own CI. They were picking their own code repository, their own CI, CD pipeline. They built their own mechanism to deploy and promote to production. They were, of course, picking their database. They were, of course, picking their application server. They were doing a lot of those things. And then there was this thing, Guidewire, in their ecosystem, right? So again, just to put things in perspective, Anup's team was the team to run the platform and the engineering team that he manages to run the infrastructure and the platform, the opinionated platform to run all these things into one flavor. One CIC CD pipeline, one database, one of each, right? That's what you get. When you get into the cloud, underneath, we use Aurora. There is no question asked. That's what we use. As an application server, we had decided to use X. And then in context of that, how do we run that? How do we,, because our solution was not exactly a modern solution, that it was kind of stateless and all those things and so on, right? So Anoop's team was not the overall engineering. It was the engineering team to say, how do I take this monolith and how can I run this monolith in the most effective way across all the customer in a way that I can operate it, I can update it, I can run it. But going back to your specific question about the tech stack, it was majorly Java. And we had our own scripting engine called Gosu. Okay. It's very similar to Kotlin, but started much before Kotlin. And then Tomcat as the app server, we were running on JDK 8, I believe. But that was a tech stack, what it looked like at that point in time. And like Diego said, the customer would take all of that, run in their system, connect to their database, apply the customizations to their heart's content to be able to differentiate themselves from the other carriers. No, I think that adds a lot of work, right? Because before you release the application components, you have to basically do all kinds of interoperability testing to make sure that it will work with the different databases that your customers are using. 100%. When I joined at the beginning, more than 60% of the capacity of the entire engineering team was basically devoted in keeping up with the stack. Keeping up. Java 11 comes out. Oh my God. Java 11 comes out. We need to redo half work and so on. You sell into customer. We had one customer that basically when we released the last release of the software, it was with Java 11. And this customer was running WebSphere and they said, oh, WebSphere does not support Java 11. And I said, but we are a WebS shop, you know, things like this, right? So imagine, imagine how much resource you waste in that things, right? Because now that is a big customer and you have to devote X amount of engineering, creating a specific version and so on. So when we, when we decided to build up on the platform, our aim was to how do we minimize the number of paths to production? Make it simple, keep it simple, and give them the options within place that's easy to use, easy for us to maintain, because otherwise, you know, if you were to give the same set of flavors across, that would be something that would be unmanageable for us to maintain and run. Yeah, with every new path becomes more exponential, you know, things that are going to change and more entropy in the system, which is just chaos for you. So I completely understand that. No, I just want to add at some point at some analyst meeting, there was a question about different things. And we looked at engineering and my team, and we look at that as a great opportunity to modernize faster. Beside everything else that, how do you do it? Can you do it? Can you not do it? And so on. But we really looked at this as this is a great opportunity to modernize faster, to make the job of every engineer more fulfilling and exciting because you know everybody ultimately what he wants to do he wants to say hey i push this thing to production today and tonight yeah this thing is available and all the customers are using it and you close the loop and the feedback is great so and so anoop was hinting into in the old world guidewire was so unique that he was releasing a new version of the platform and the application every two years. Because the platform was changing the application every two. Imagine that you release every two years. Imagine the business practice. Imagine product management that has a loop every two years. And then our customers were skipping every other release because it was too much to update. So now you're working on something that you deliver and is going to be taken four years from today. By then, the world has changed three times. So that was, for us, was a great opportunity. Yeah, and today's standards, that'd be ancient with these release cycles and things like that. Yeah, by the time we released, we were already ancient. We started on something two years back and the end of the release cycle, you know what? There's something new of that thing coming already. But also the worst part is that that would mean that the customers would then go do a lot of customizations. We release a new feature that is adding on to something they already customized and three-way merges with help, right? Both for them and for us, because we need to be able to provide the features. And if they don't upgrade at the same pace as we're releasing, then that's a lot of wasted work. Yeah, I imagine the headaches that could come from there. So, I mean, at the end of the day, you came in and wanted to change this for your organization. You want to make this more efficient, be able to release, you know, with more of a golden path, it sounds like. Now, this is a lot of change, not just technology-wise, but also to the organization. I heard you say, you know, you had to build with what you had in terms of engineering resources. So I'm guessing that also means you had a lot of education to do internally beyond just building and having a plan. So, you know, how did you drive sort of the buy-in and approach at sort of an organizational level to convince people to say, yeah, like we have to do this. Okay. So this is always like a complex thing. So I could talk about this for hours or days, maybe. Everybody has his own idea and so on. So for me, this was three things. The number one thing, this was going to shake the company seriously into their foundation. So I needed to have support all the way up because I told my boss back then, look, you're going to get noise and I need you to have my back and I need to spend the time into all the trenches and all the discussion and so on. So that was number one. Number two, I needed to have an organization that was structured in a way where knowledge and skills was at the epicenter versus an organization that it was manager and political at the epicenter. So it kind of went onto an initial path of flattening the organization a lot. We re-transformed the entire engineering team into what we call pods, but pods where the leader of the pod is not the manager is a combination of sometimes the staff engineers with leading lead skills. Sometimes as an architect with lead skill, we have this concept of an L1 that runs the pod, but he runs it from a technology skills perspective. So we restructured the entire company like this. We reduced the number of managers and we ended up in having 70 pod. And at the beginning, we had a few meetings with 70 people in which I went there and said, look, you are the 70s most influential people in the company. I kind of don't like architects that are PowerPoint architect that are kind of preaching. So we told everybody that you want to be an architect, you are an L1. Get into, drive a pod, put your mouth where, you know, and your hands and then make it happen. And if it doesn't work, then you learn. And so that was a portion, right? Rebuilding the structure. And then number three, I'm not the big believer that you hire the miracle worker and you give him a bunch of capacity because then you have a lot of pushback and so on. So the way that we did it, including Anup. And Anup and I worked together before. So he kind of had trust on me. I said, look, Anup, you come, you're going to be, we're going to build a platform around you, but you come and you're going to be an L1 to begin. You're going to manage five. And then three months later, you're going to manage 10. And then six months later, you're going to manage 12. And then the more I free up capacity from all the other things that we should not be doing, the more your team will kind of get a little bit bigger. And I did that across two or three area, exactly in that way. So when you do things like this, yes, there is a little bit of noise of we should not do this and so on, but it's not considered immediately a threat, right? The sort of the antibody are not kind of waked up immediately and say, oh, there is this thing coming. We need to shut it down immediately. It's considered like, oh, Anoop is doing this thing on authorization and Kubernetes. We start with Kubernetes and authorization. These are the two initial surveys that we kind of built. And so, but then slowly these things becomes a little bit more real, a little bit more real. And once you do it like that, people start to see a little bit more interesting things happening there. And now you start to have people that instead of fighting, they start to say, can I join? Right. Can I work on that? Right. And then this also combined with every time somebody quit, because then you have the people saying, oh, this is not for me anymore. I whatever. And then every new hire is going to go through a different process and so on. So Anoop had this kind of team that was predominantly made out of either the volunteer that were kind of, I want to join this thing and combined with also the new hire. And then in Anoop specifically, in our past life, we always loved the idea of doing peer development. And so Anoop came to me and he said, I want to do peer development. And so I said, sure, let's do peer development. And then the antibody of the company pushed back immediately. He's like, no, no, the pair development is not for me. Bad idea. But we managed to do pair development, or actually Anup managed to do pair development in his team. So his team to the day, still pair development. The rest of the company is not. So that was a huge aspect of knowledge and learning and so on, right? Because now with pair development, new blood blood willing to learn pair development was a multiplier of that knowledge so with with teams like a noobs that was doing you know pairing and then you know others that weren't and you have people leaving and you're hiring people on and you're gathering people internally what did sort of the training training look like? How did you get everybody sort of, you know, trained to the, these new concepts and what you were trying to do, or was there a set system or it was individualistic? Yeah, it wasn't easy. The one great thing of pair programming was the idea that the diffusion of knowledge is much faster. So within my team, I was able to move fast there. Now, the question you asked about, okay, what about the teams outside? We needed to build something that could build trust, right? You wanted people to start using those things. So I'll give you an example of my experience. I came in, Diego hired me and said, okay, build a platform. I built a platform in something in, I think, four or five months. We had a basic version, the version one of what we call Atmos. Atmos is our platform as a service. It was brand new sitting there. Nobody's using it. I was like, wow, we have an amazing platform why isn't anybody using it then i went did some user interviews with uh with the developers there um and they said hey you want me to learn kubernetes you want me to learn spring boot you want me to learn how to do auth uh in the new way you want me to do observably new way i have not done any of these things what are you going to do about it so we had two paths we spent a ton of money and time invested in training them or make the barrier to entry for those things low right so we started off another pod at that point which is called nova now we've re-christened that to Polaris, North Star, if you think. To build those templates out. So as a developer, I could go to that tool and say, you know what? I want a microservice. And this is my data model. Build for me a microservice. And it will generate for you an entire Spring Boot microservice that does authentication. So you have open API endpoints, Swagger, authentication, property built in, the cross-cutting concerns with regards to observability, Kubernetes, CICD, all of those things done. Persistence. Persistence, all of that baked in, generated. So then now I could, as a developer, focus on the business logic. Obviously, a CRUD endpoint is not enough for me. I need to write on top of it. So we did that. That was a huge hit. And we were able to do a lot, demonstrate a lot. Now, in the past six years, we have a lot of experts in Spring Boot, Java, and Java, they were already experts in. And they're already using Kubernetes to the extent possible. So that was one thing. Obviously, Diego's also invested in training. We have O'Reilly subscriptions. We do regular trainings once in a while with third-party trainers, et cetera. And we have our own training. We used to call safaris that we used to extend saying, hey, how do you start from one endpoint, do an entire use case till the end? But the key idea was to do, to templatize this in a way that the developer, the way that the developer thinks, right? If you give me a common line in which I do three command and I get, yeah, the starting template of an application, then of course they're going to say, oh, then instead of saying, I don't want to use it, they're going to come back and say, there's not enough. I need these extra two things. And once they come back and say, is not enough, I need these extra two things, you you know that you've got them now you know that the game conversation is different so now slowly now slowly we had this team of a nook that was building all those service and then and basically the service were authentication there was this thing called nova there was atmos uh then we start to build something for api gateway egress and and so on. And then we start to sort of infect between, quote unquote, the rest of the organization that when they needed to build a little app or a service and so on, they were saying, oh, until yesterday to build this service, I needed to do everything myself. Now I can have a higher starting point. So now you start to get into a point into which things are a little bit like between mini viral effect, right? That you have like a loop and in the loop you start to get and so on. And then, and then you start to be in a, there is a point into which, you know, in an organization like this, again, I don't believe that, you know, the general with more star goes into a say, do this, do this, but I believe more into something that is more bottom-up. Now we had an organization with all these L1 that were engineer, skilled folks with detailed knowledge. We start to have a platform come in. We start to have a buy-in slowly but surely. And then it took time. It took everybody to say, yes, we can transform this company to a cloud solution. It took time to buy the belief of everybody. But slowly the percentage of believer increased. And the more that happened, the more you start to have somebody that is going to become in any coffee chat and lunch break is going to be like, no, no, but did you try this? This thing is cool. We can do this. We can do that. And then it starts to become contagious in the positive way. Instead of being at the beginning, I've been in so many meetings where there was like, we cannot do that. It's not possible. Forget about it. There is no chance and so on. And you cannot change that as a leader. you cannot be in all the meeting and and and break down all all the arguments right also because the argument come from engineers that are smart they are going to focus on the things that are not possible and and and and so that was the kind of the the tide change no i think it almost sounds like you described how you built like a social network inside like this platform yeah yeah so two other things that that helped us in this training and ongoing uh conversation was the fact that i uh earmarked certain capacity for interrupts so we had slack channels where i had dedicated people supporting nova supporting atmos authentication, supporting each of the products that we have. So when people ask, there is always somebody ready to answer a question. So when that happens, that engagement helped. That was one thing. The second thing was we did rotations. We located people into our teams and out. So they would come in, learn a few things, go out, apply things in their teams. Even within my teams, I do rotation between the many teams that I have every one year or so. We do rotations and it helps get us out of that Stockholm syndrome of just because we've done this this way all this while, this is the right way. No, we need to disrupt internally, right? We can't wait for somebody else to disrupt us from outside with innovation. I think those are great things that any organization can follow if they're going through a similar

Automation and Modernization Strategies

00:38:34
Speaker
journey. I did have a follow-up question on Atmos, I know, but I wanted to give I think Diego was about to say something. No, no, no. Go for it. Okay, perfect. So Anup, we already described Atmos and that's what the platform is called. I wanted to follow up on the templated code generation. Like now with the whole Gen AI wave, we see a lot of code generation things using copilots and other tools. This was before that. How did you build it in the platform itself that it gives you all the things that developers don't want to worry about, the authorization, the Kubernetes, YAML files? How did you go through that journey? Yeah, so I used to work at Pivotal prior to this and I used to be part of the Cloud Foundry R&D team. Over there, I'd learned of multiple tools and many of the things that I'm doing here was also borrowing from their idea, at least in terms of thinking. So there was a tool called Jhipster. It's an open source tool, which can be helpful creating templates for generation. So we borrowed from that, wrote our own wrappers on top of it. We wrote our own templates within that. It's mostly a template that is, I think, based on cookie code. I forget the exact templating engine that they use, but it was easy to create new templates inside of it. But then over time, what happened was that we had customized and customized it so much that we couldn't use a core engine anymore. So then we pivoted to Backstage. One great thing of using Backstage was that we could employ our templates and now we could enable our developers to create new templates if they want to on top of whatever we have generated. Plus give them more freedom to do other things that we don't support. My teams don't support. So that self-service, the freedom to innovate and try things is one of the biggest things that I value. And I think Diego as well. And doing these things have helped gain a lot of conversion into this thought process. And just two things to add. Number one, of course, we were at the right time, meaning that around us, the ecosystem evolved. If you try to make the same success story five years earlier, go figure, there was not the right time. So the other thing that was very important was to move the company away from this syndrome that is we build everything ourselves. The stack that we had before, we had the mini version of everything ourselves. And we had those developers, engineers that came from a world into which, to the point that they build their own language, right? This Gosu is their own language. So with Anoop's team, we said from very beginning, we use everything that we can. So I forced him to use AWS if it was available. He told me, but if I'm not available, I go open source and do Go. And so then we kind of changed a little bit the mindset of keep an open eye on things, right? Adopt, learn, crash, move. And so that thing also enable us into the journey to mature, enable us to kind of become a little bit more visible into the open source world and start to be, you know, ultimately having this call with you guys, you know, Guidewire six years ago, we'll have never had the conversation about something, you know, it was, it was, everything was so proprietary that we could have a conversation about different concepts. Gotcha. No, so Anup, one more follow-up question on Atmos, right? So Backstage is one component, AWS is another. What are some of the other components? Like what does Atmos look like? Where do you run it? And things like that. If you can share some more details around the platform itself. Sure, absolutely. So as a developer, the first thing I interact prior to having our platform was that everybody would run their own AWS accounts, subaccounts and EC2 instances, deploy things, test it and then get back and then we'll release software assuming it'll all work somewhere with some customer. Now the interfaces, we have an Atmos CLI through which I can, when I run those commands, I log into a Kubernetes cluster. And based on my authorization model, I get the namespaces in which I have access to. I can deploy things in there. And while deploying, we have a lot of governance templates and webhooks that validate, you know, have you put in the right things? Have you put the right labels for cost attribution, for security attribution? And as part of the pipeline into pushing, we do a bunch of checks with regards to using Twistlock, Orca, others to figure out how the containers' vulnerabilities are looking. Once in, networking is automatically handled for you. We use multidimensional things. We'd like to do A-B testing, so we're still in the process right now of deciding which ingress to absolutely. We mostly use Kong as one ingress. We are also maintainers to an open source project along with Alibaba and other folks called Cube Vela. And if you guys have heard of it, it's a way for writing. It's an open application model. So I could describe my application saying, hey, this is my application. This application depends on this database, on a Postgres database, on a DynamoDB store, and let's say an SQS, right? And behind the scenes, what it would do is it's abstracting out the creation of all of those resources for us. So we can then delegate it down to cross-plane or native Terraform controllers and all of those things. So as a developer, I don't need to know about any of those things. All I say is I need Postgres, right? Why do you care what's in AWS or it's a container Postgres that we're running in the Kubernetes cluster itself? So all of those things are abstracted, I can focus more on the things that I can bring value to, which is building out the application and the purpose for it. We have a lot of operators that we've written ourselves. Some of them does a lot of orchestration for the huge monolith that Diego is talking about with regards to our insurance application platform. But with the microservices, we have a lot of leeway there. We interact with Red Hat and others for building out open source, open cluster management, which is a way for application placement and VR activity. That's another thing that the platform gives to a developer automatically. These are all things that some of them are in various stages of development. Some of them are already done. But generally, this is what the platform looks like to a developer. And I just want to add one thing. The complexity was a little bit fixing the plane while flying the plane in some way, right? That's the analogy we use internally. It's like you're in the air and you need to kind of tweak something. So on one end, we were building this platform to run this kind of huge monolith into an efficient way and so on. At the same time, we needed to evolve the application that was part of the monolith because we always had the requirement from the customer perspective that say, hey, I want to change the algorithm that determined the rating of your policy, let's say. You have a policy and I want to change it a little bit because we have this new idea that if you are whatever, 18 and you have a degree or not, you're a safer driver or not, let's say. And when you want to make that change into the monolith, you need to redeploy the full thing. And so then there was business needs to sort of know, I want to have the rating logic outside of the core. So now we enter into this world into how do we do this as part of what is our target architecture? And our target architecture was what we call hybrid tenancy. So we had this idea that we had the monolith that is what's going to run single tenant in one end. But then the monolith had an awareness that it could run in combination with multi-tenant microservice. And then we want to create a journey to put into these external microservice things. In these external microservice things, that could be authentication, or that could be rating, or it could be a couple of other things. So now what Anoop needed to do from an orchestrator perspective, when you deploy and so on, now you need to have an awareness that, oh, I'm deploying this thing, single tenant here. And this thing has implication of the multi-tenant world. And when this happened, this has to happen. And when I promote this, that promotion from an environment X to an environment B has dependency. So we end up building, as I said, a lot of things that are specific to sort of extend the life to our product that was the successful product and to build around that successful product, a journey of modernization, right? And so then being a publicly traded company, you need to continue to sell, continue to convince customer, continue to, and so on. So the sort of marketing pitch of the company was number one, we need to find an architecture way to do that. Number two, we needed to continue to sell during the journey. Not easy. And number three, we needed to do this at the margin profile that was a margin profile that will succeed, right? So that's why part of the work of Anoop's team was fundamental to figure out a way to run that efficiently on AWS on an application that traditionally was, as I said, stateful with all the dependency that that statefulness have on point of view of cost. So we applied what is naturally called the Strangler Fig pattern by Martin Fuller. That's like, you take one of the business logic out, make that multi-tenant so you slowly shrink the core, which gives us economies of scale with how much you can run and how much you need to... The more things you have with the multi-tenant services, the smaller your core is going to be. Otherwise, running a code with a clustered mode of one terabyte of memory and everything is not tenable. Yeah. So I guess one of the questions I have is how much of that monolith is still intact, like maybe a guesstimate or a percentile? And maybe you can speak to some of the best practices or lessons learned, doing that strangler pattern, right? When you were pulling things away from that monolith or adding a service outside of it. So I would say around 80% of the monolith is still there. What we stopped doing is building more into the monolith. I don't think that, again, gives the right percentage because the more things we're adding is all on the outside. It's never inside. We have to remember, we are a business. Our aim is to make sure we connect to our customers and give them the value. We have a lot of these assets that have been built over the last 20 years. We need to make sure there's no disruption to that asset usage. And at the same time, when it makes sense for us architecturally and business-wise, we are looking at ways to push things out. So two things that when I talked about the things that were not there before was that any, the monolith could connect to any number of integration endpoints outside, right? For example, DMV connecting to, you know, maybe an address book or other things earlier. Now we have what is called an integration gateway through which you can run small integrations outside of the monolith. Earlier, those would have been written into the monolith, but now they could all be written outside, run on,
00:50:51
Speaker
ah modern stack there and host it on our platform much more easily nice similarly
00:51:00
Speaker
as a service so our customers can write extension points as functions within the platform, right? Those are things that were never in or they were in Thank you. some flavor. But as we move forward, as new customers start adopting these things, the cores need for those components will continue to shrink. So I think the path might be long. We still need to use what we have, but the alternatives are already available or in the process of being built. Yeah, so the key decision was always, they always find it is a feature of the system rather than a limitation. The feature was we're trying to run, we want to evolve the system to run into modality. You have run a lot of things in the monolith. You're happy with those things. You don't have a business need to change them. I'm not going to force you to change them. You are going to have the limitation that you do have, right? If you want to change your rate, you need to redeploy. I'm going to make the redeployment easier, faster, quicker. I'm going to shrink the code in a way that that loop is a little bit faster, but ultimately the code is there, right? So you need to, I can build a nice blue green environment in which you can do the second, flip the switch and go into the new system, but it takes you time to rebuild everything and so on. But that functionality is there. So if for business, you have a portion of your business that doesn't change much, you're happy. The business does not require change. Good. At the same time, if you come to me and you're going to say oh but you know to cope to cope with the market and so on i need to change the rate of six times until sunday and whatever then we have a new pattern and we can say oh but you can configure the system in this way now and you can start to build those things into this alternative and you can use them as an, and so that is became a sort of lots of big portion of the success of the transformation, right? Instead of pushing you down and forcing you to adopt something that typically we have, we have a more of a pool situation. There is this available. If you see enough value to do the chain, implement this differently, and there is enough value, business value, you have an option. And so then gradually this thing is shifting and the customer pick based on their business needs rather than on pure architectural, how to say, optimization. Okay. No, thank you for walking us through that, Diego. That was super helpful. One question that's relevant to the times we are in right now in 2024, like how are both of you at Guidewire or around Atmos thinking about Gen AI, incorporating LLMs into Atmos? Like, what does that look like? How do you get to the

Future Innovations at Guidewire

00:53:44
Speaker
next phase? And we meet again in a year or two and talk about how you have used AI in your stack. Do you want to start, Diego? No, you go first. So there are two parts to that question, right? Gen AI is the talk of the town everywhere. It needs to make sense for us in the context that we're doing things, right? So there is tremendous value for Gen.ai in the application space, definitely, but we need to see that actually, I mean, along with that, it comes a lot of concerns, privacy, PII, all of those things that we have to, you know, make sure that works out. We do have another team that is working on some of these things, which is like we have what is called a Gen AI Connect or sort of think of it like a gateway for connecting to different type of models within the platform. And that's work in progress, which means that you can use your application, use the gateway and the connect to connect to different types of models for different types of needs, say claim summarization or underwriting aspects of things. Those are the application side of things. Then when I talk specifically about Atmos, I have a lot of, like I said, I spend a lot of time in interrupts, right? The channels of my teams are probably one of the most trafficked Slack channels in the questions. How do I reduce that toil? How do I increase the accessibility to information? There's a lot of that productivity aspects of things. So we are already working with certain models. We're working with a company called Glean that does a lot of answering questions. Glean is awesome, man. Yeah, it goes through of our docs. And when somebody asks an interrupt question, the first thing what we did is we delegated to Glean and Glean gives an answer. And if it's not satisfactory, then it pings one of us in the team to say, hey, the answer wasn't satisfactory. So that's level one. Level two is where I want observability and autonomous activity. So there's one, there's a blog post I've about 80% done is about how we react to events and observe and how we do observe things in, in, in production. Right now with dataadog, etc., we have a lot of correlation, gives us a lot of events, etc. That's awesome. I would want us to go to the next phase where we're using those events and that's pumped into the AI model which is trained specifically on our events in the past and the activities that we've done, and then come back and tell, hey, I know we think this is going to go bad. This is probably the set of actions that you need to do to fix it. Level one, right? And I take actions through that UI, and over time, it's learned the actions that I've done. And then the next stage is level two or three, I think I put in the blog, is the, hey, Anup, I found this issue. I fixed it for you. This is what I did. And this is how the observability status looks like now. So that is on the productivity side. I'm sure there are more things on the app side that Diego can probably talk more about. On the level one, there are a couple of things we're doing right now go back to our transformation and challenge. So Anup said, we have introduced this integration gateway, Apache camel base that things run outside and so on. But traditionally, our customer, if they had typically a claims system of 50 to 80 integration, a big work was writing those integration. Those integration were part of the monolith. So now we are using what we call the GNI Connect as a mechanism to rebuild those, convert those integration into a modern flavor. Because we said, okay, if we have this very sophisticated cloud API now, when we need to make an integration with another system that has modern API with Swagger and so on, this is a perfect task for Gen.ai. It's like that kind of code is a code that is, okay, is a mapping code and so on. So we created a piece of functionality that is going to go for our customer and says, hey, we noticed that you have 55 integration. Do you want to convert them into the new mechanism and so on? If you want to convert them, here's a button to convert them. Of course, customer will say, how safe this is. I said, yeah, sure. If you will convert that with a bunch of developers, there will also be some. So it's a first nice pass. Similarly, you want to convert some aspect of rating that I wanted to discuss a second ago. So we are using it right now as a tool for conversion. As you know, Amazon went with this thing and says, hey, if you need to move the Java version and they... So I think what Anoop calls level two and level three, we don't need to be first mover. There are some companies that are way more threat by Gen.AI than Guidewire is. We want to make sure that right now we are using it in a way that is in every direction is can augment our capacity, right? You know, for jobs that, jobs, not jobs like hiring jobs, you know, computer jobs. For tasks, for tasks that could be uplifted by a Gen.AI support, right? So, and horizontally across the platform, we try to do that concept all the time. And this, can a Gen.ai can help you to do this a little bit more efficiently. Absolutely. And we've had other guests on the show and other practitioners that really talk about how they're using it very similarly as a productivity tool, a way to increase efficiency, right? Not necessarily all the way at the other end of the spectrum where it's solving things for you and doing things for you. I'm sure there's many use cases that fit better into that scenario, but I think for the most part, in terms of AIOps and development and code snippets and stuff like that, it definitely is hopefully making an individual task, like you said, a little easier to accomplish. You know, I think we could probably with as much stuff as it seems like is going on at Guidewire, which I've learned so much in the last about an hour, we do want to wrap it up. So I do want to ask one last question, maybe to both of you, which is, you know, where can people, um, find more about what you're up to? And, and it sounds like there, you know, there's a lot going on. So are you hiring? Would be a great question as well. Yes, we are. Yes, we are. Yes, we are. We've been, we've been hiring constantly through this process. Um, we are not hiring at the speed of, you know, crazy, but we are hiring. Sure, yeah. Across all our development location, we have predominantly six locations in the world, San Mateo, Bay Area, Toronto, Dublin, Poland, in Krakow, and India. Those are the main thing. We have an interesting hybrid in-office policy that most of our developers enjoy, but every given day there is always a pushback, but we think that there is a good in-between compromise. There is a bunch of blog around about a few things that we're doing into the innovation space. And you can also find some video on YouTube about me and Anup talking about some kind of sweet things we did. Yeah, well, we'll make sure to follow up and get those links and stuff from you as well as where someone can reach out to contact if they're interested in applying and those kinds of things. But again, I think it's clear we're going to have to have you back on the show because there's so much more to talk about. We crammed a lot into the last 50 minutes or so, but I hope our listeners found it as useful and interesting as I did. I

Wrap-Up and Reflections

01:01:59
Speaker
know I did. So I just wanted to thank both of you, Diego and Anup for coming on the show and thanks for being a listener, most importantly. Thank you for having us. And I just want to use your medium to say we're doing a lot more interesting stuff than we've just discussed. Sure. Obviously, if people are interested, happy to hear. We also have an open source contributions called Guidewire OSS in GitHub. Check it out. We just started. Yeah, we'd love to hear. Awesome. We'll put all those links in the show notes. Thanks, Anup. Thanks, Diego. Thanks a bunch, guys. Have a good day. Have a good weekend. Cheers. All right, Bob. And that was a great conversation with Anup and Diego. I did not know how much was going on at Guidewire and really didn't know much about Guidewire at all until we talked to Newt and Diego. I'd be curious, what was your feedback and takeaways from that conversation? No, I agree with you, right? Like Guidewire doesn't seem like, I don't know if they're a CNCF end user or end customer sponsor or not, but when you think about Kubernetes, you think about Mercedes-Benz and Audi and those guys that have sponsored kubecons and home depot and things like that but guidewire low-key man those guys are doing some amazing work like taking something that's super critical to people's livelihoods like insurance claims especially with the times that we are in right now and and making sure that they go through this modernization journey without impacting their main code like code base i that's, I like how they walked us through their approach going back four years, going back five years, how they originally started this idea and build these pods. Like I know it was cheeky since it's also Kubernetes pods, but like build these pods inside your organization and had a rotation program. And I think it's a good story for everybody who's going through a similar journey. And maybe they're having some challenges. Maybe they get some ideas from this episode. But no, I know we didn't go into a lot of technical details like we do for other episodes. But I think even from an organizational perspective, this was a nice episode. Yeah, I mean, I like these episodes that aren't necessarily all about the tech, although we did get into a little bit here and there. And, you know, I didn't know much about the company, but I really appreciated the conversations around the change in organization, right? How do you train individuals or teams? And can individual and different teams actually run different ways? So that was one of the outcomes that I was surprised by that like a noobs team operated a little differently with pair programming than others, but they were still part of the same company, still kind of part of the same platform, so to speak. And also we don't talk a lot about how to kind of take apart a monolith, right? This was something that when microservices were really coming coming on, there was a lot of research, a lot of blog posts, a lot of stuff about how to peel back the onion. They were using the strangler pattern, which is what I was familiar with. I'm like, oh. Remembering how some of these big companies with these monolith applications are taking an approach, which I thought was really, that's why I kind of had the question around, you know, how much of that still exists? Because I would consider Guidewire a success story, and I'm sure they would probably tell us the same. And I think it was, I forgot the, I think it was 80% they said. Yes, 80% of the. So no. Yeah. And so that, I mean, that alone can tell you that I don't think you should, from the get-go, tell your leaders, your CIO or your CTO, that you're going to get rid of a monolith. You're just not going to. You're going to go inside and go for five years and just work on that. Yeah, or you don't have to, right? You can just, you peel back the parts one at a time that work, that best fit the scenario or the use case. So yeah, that's really stuck out to me. And it was, I think a really awesome conversation. No, I definitely appreciated the time. I know it was a long episode from an interview perspective. So we'll keep this short, but then, yeah, it was a great episode. All right, cool. Well, that brings us to another episode of Kubernetes Bytes. I'm Ryan. I'm Bhavan. And thanks for listening. Thank you for listening to the Kubernetes Bytes podcast.