Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
John Grubb - Making FinOps Work Like an Engineer image

John Grubb - Making FinOps Work Like an Engineer

S1 E7 ยท Straight Data Talk
Avatar
34 Plays3 months ago

John Grubb is the Sr. Director of FinOps and Cost Modeling at Platform.sh. With experience as a former Data Platform Director, Director of BI & Analytics, and Director of Customer Care, John brings a sharp perspective on why cloud costs matter. He knows how to align financial and engineering teams and believes that FinOps is about maximizing the value of every cloud dollar rather than just cutting costs.
Follow John on Linkedin- https://www.linkedin.com/in/johnnygrubb/
John's blog - https://www.thefinoperator.com/

Transcript

Introduction to 3D Data Talk

00:00:01
Speaker
Hi, I'm Hewlett Kuchov, COO and co-founder at Mesh Radio. Hi, I'm Scott Herlman. I'm a data industry analyst and consultant and the host of Data Mesh Radio.

Inviting Data Practitioners to Share Experiences

00:00:10
Speaker
We're launching a podcast called 3D Data Talk and it's all about hype and data field and how this hype actually needs reality. We invite interesting guests who are first of all data practitioners to tell us their are stories, how they are putting data into action and extracting value from it. But we also want to learn their wins and troubles as a matter of fact.

Conversations Behind Closed Doors

00:00:32
Speaker
and as Elia said we're we're talking with these really interesting folks and that a lot of people don't necessarily have access to and that these awesome conversations are typically happening behind closed doors and so we want to take those wins and losses those those uh struggles as well as the the the big value that they're getting and bring those to light so that others can can learn from those And we're going to work to kind of distill those down into those insights so that you can apply these amazing learnings from these really interesting and fun people and apply them to your own organizations to drive significant value from data. Yeah, so every conversation is not scripted in a very friendly and casual way. So yes, this is us. Meet our next guest. And yeah, I'm very excited.
00:01:19
Speaker
Hi, all.

Guest Introduction: John Rupp

00:01:20
Speaker
It's back. I straight did a talk at Celia together with Scott. And today we have a pleasure to have, um, to host, uh, John Rupp from a senior director of FinOps from platform SH. Is it correct? Correct, John? It is correct. Okay.

Career Transition: Sales Engineer to FinOps Director

00:01:40
Speaker
It's good to know. Okay. So a little bit of a history. Uh, how I met John. I mean, it sounds like, um, you know, serious. How I met John.
00:01:50
Speaker
and Okay, so ah long story short, um I find myself a very straightforward person, you know, maybe it's like of English, whatever, but what I mentioned on LinkedIn that John behaves as me when he gives comment to people or as Scott says, giving a shit, giving shit to people, like being super direct. And I enjoyed that so much from John. And I just, you know, I hit him with DM saying, John, I feel like we need to get acquainted because we've been commenting each other for a long time and enjoying each other's comments. And this is how we ah ended up meeting. And then we did

Passion for Infrastructure and DevOps

00:02:30
Speaker
a super nice talk where John told me about his joinery from being a yeah sales engineer to ending up um having a position at his current company as a yeah senior ah director of synapse practices, very much involved synapse for data.
00:02:48
Speaker
Okay, ah I'm ripping up here, John. Go ahead, give us an intro, and you can brag.
00:02:58
Speaker
I don't know. I'm ah less than a bragging place these days. But ah yeah, I mean, you fairly, fairly accurately described the journey. as So I've been in my current employer, Platform.sh, we're a platform as a service company. We've been around for a while. um We're a French startup. Well, we were a French startup. Now, I guess we're technically somewhere beyond startup. um I started here, I think my eighth anniversary is actually goingnna it' sometime this week. I'm not sure if it's, you know, it's soon. So I've been there for eight years. I'm trying to send you balloons, you know, sometimes so that people does it. Yeah. Yeah. Yeah. That's the most alarming thing. I turned that off immediately.
00:03:36
Speaker
um Yeah, cool. Now i've I've been here for eight years. I started, you know, I had a previous, my previous employment was basically as a sort of a generalist web developer. I started off on the front end of things, slowly moved into the back into the things and just kind of kept moving further and further back into like infrastructure

Role of Sales Engineering in Customer Success

00:03:52
Speaker
and DevOps. And I was, I was kind of like the the infrastructure person that my previous employer, um I had automated like all of our AWS stuff and all of our other stuff with Ansible back when Ansible was a new thing, you know, and was managing all of our, you know, our varnish configs and our, you know, our memcache cluster and just like I was just really into infrastructure and making things fast, you know, um and that just happened to be like a good training ground for this gig. um I wanted to work a little bit more with like customers and people I wanted to move like a little
00:04:23
Speaker
Close to the front of house if if that means anything to you and ah this opportunity to open up here at this place platform not a sage um That was a company started sort of like grew out of the Drupal space, which was a open source CMS That's still around but was very very hot around, you know 09 2010 2012 and that was kind of like that was the technology with which I made my living um in those years and some of the like, big contributors and like, well-reputed, you know, like reputed ready yeah reputable people from that community. um We're working here at platformization. It was only like 30 people. And so I just kind of was like, you know, the shape of this opportunity looked cool. I didn't necessarily care like what exactly I was doing.
00:05:07
Speaker
I kind of just wanted to work with this group of people and the sales engineer job just happened to be one that they had posted and it was like a laundry list of all of my qualifications with this, you know, sort of like experience with all this infrastructure and then, you know, ah knowledge of the sales process, which I had had none. That was the one we had to fill in, you know, and that was how I got my kind of foot in the door here. And I did that. I was employee number 30 or 30 ish um here in 2016. And so I did that for a few years and learned a lot about extracting information from people, um customers specifically, who would come with you know problems and imagined solutions in their head. And so you have to kind of technical people. So you have to take those imagined solutions from these technical people who own these applications
00:05:47
Speaker
And then the knowledge of my platform and the company I work with and the solutions we offer and figure out how to like steer them together, you know, and it was it was ah it was actually a like I really enjoyed it. It was fun being able to like jump in with a brand new situation, diagnose what this need actually was, help set expectations and like communicate with you know, people you'd never met before, technical people and help them understand like the problems you can solve and how you can help solve their problems. It was it was kind of a blast, but um there wasn't like, I wasn't sure like where it was going.

Initiating a Data Platform Team

00:06:20
Speaker
So this opportunity to um to sort of move up into a management position as the head of our customer success team opened up after about two years. And so I so s asked the former head of that if I could have her old job and she was like, heck yeah,
00:06:33
Speaker
um Because, you know, a lot of the relationships that I would be managing were customers that I had brought in in the two years previous. previous So, like, it was kind of a natural fit. I already knew relationships and architecture and technology and global blossos. That was a fun leg, except for um the visibility that I had in that position required, you know, customer success is ah is a business spanning sort of position. And at that point in time, we didn't have like a ah holistic BI effort. We didn't have any visibility like into our there was there were visibility into specific data silos around the company, but we didn't have anything that tied it together because my customers would pop up with problems and all over the place. And the ideal picture in probably every customer is success is a professional's head is being able to understand or know or be alerted that customers is in trouble before they tell you. you know And that's just not a thing you can do without the data being ah yeah right. this wife skin doesn yeah i say
00:07:27
Speaker
And so that was like, I sat and brain, you know, I just sat and stared at this problem and dealt with this problem for um several quarters before finally, I was like, can we please do something about this? And that was my intro. That was like, they said yes. And so that was my introduction. That was that was the sort of foundational or founding moment of like the data platform team here at my current employer. This was at the beginning of the pandemic. So, you know, h1 of 2020 was when this is all happening. and So that was sort of like my reintroduction. you know I had a previous leg with data and data warehousing from like the the previous employer, but this is sort of my reintroduction four years later.

Need for Evangelism in Data Platforms

00:08:02
Speaker
um and The world had moved on to where cloud warehouses were a thing, you know the most advanced thing
00:08:07
Speaker
in the world previously was you know it in my previous leg was was redshift that was where we're getting ready to move into um and so you know snowflake data bricks big queer all these things come along in the meantime informatica and whatever sql servers whatever move your data around service was called like that was what my my previous employer was using and so you know the like The whole job of actually architecting and building out a data platform had gotten a lot easier um in the years since. you know um and so It was like kind of just a ah period of surveying the landscape and and picking the right pieces and understanding the business problems we were trying to solve to pick which ones to bring into the house. and That was the technical job. that was I don't want to say done, but it was 80 percent done after about six months.
00:08:50
Speaker
And then I think we talked about this previously, Yulia. is it was i was like i was I was really stunned at that point to realize that nobody actually cared. you know yeah after it was like the if If you build it, nobody actually will care. you know So um that that began sort of like my introduction to the real job. And the real job is is like evangelizing, you know helping every single user who shows the faintest spark of interest in improving their work life and understanding data, helping helping nurture that little spark ah into something one by one by one by one.
00:09:26
Speaker
you know because the business is not sitting waiting for you to solve this problem. The business is just moving along, doing the thing that it it does and has always done. um And so, you know, that was that was the the next and the majority probably chunk of my of my time as the director of the yeah the data platform team.

Optimizing Cloud Costs

00:09:43
Speaker
That was, you know, I hired people and we had a team built out and we had this very specific need that was still sitting there this whole time which was better, much better visibility into all of our cloud costs. We are a platform ah platform as a service. we host production you know We host our customers' production workloads on AWS, Azure, GCP. Over in Europe, we host them on OVH, on Orange, and you know like we have a theoretical capability to host on basically any open stack complaint, the infrastructure as a service provider.
00:10:17
Speaker
And so, and so we have done that. And it's like this technical marvel that we can deploy anywhere, but nobody, well, we didn't, we didn't seriously question like the business marvel that was going to have to be built out in order to understand where were them all this money was going with all these different cloud providers as we were standing up, you know, production workloads and all these different cloud providers. And so that's, that's really kind of like the genesis of FinOps here in platform stage. I was not the first in the role, but I was reassigned to this about Well, I kind of inherited this responsibility two years, two and a half years ago and started basically re-architecting the entire thing from a data point of view. It was always driven from the finance team. um And fortunately, yeah, go ahead. No, no, no, no, no, no, no. I just agree that it's very interesting. This one is ah really interesting that you said that it was over called by finance team and now you, Harry, did it. How was it?
00:11:10
Speaker
yeah it was ah no i mean Honestly, it was a little bit contentious at first you know because I was basically taking this thing that was owned by another team and I was just kind of redoing it the way that I thought was right. and That tends to rub people the wrong way. you know and ah Par for the course, I guess, with the introduction you gave me. um but yeah well you know it was just like we we we kind of which we tried it their way. um And it was just still this very, very difficult relationship, you know, and so eventually, like, the old team that was kind of in charge of it, they just sort of like eventually quit and moved on to new jobs. And then I was left fully kind of holding the bag or in charge, depending on how you want to look at it, um of this, you know, this deep data discipline plan with this very specific and highly visible and valuable, you know, subject.
00:11:59
Speaker
Okay, I have a question to clarify. um So right now you do more of a data phenops, not an infrastructure phenops. Am I

FinOps: Understanding Infrastructure and Data Costs

00:12:08
Speaker
correct? I would say a combination. I mean, it's a combination because the data is from the infrastructure, you know? and so like i know But it feels very different to do the data phenops and infrastructure phenops. Like you can, you know, shut down some... Oh, I see what you're asking. You're talking about like, so we're a BigQuery shop and you're saying like most of my focus is on the BigQuery costs. Actually, no, that's not true at all. um That's just like, that's a side piece I haven't even really gotten to yet. Like that's ah that's a big bucket of optimization, a problem that nobody's noticed yet that we're just kind of, um I've kind of gotten the
00:12:39
Speaker
Hopper for later. You know um we are know how you make me feel right now? I don't know if I was this excited over the last month. ah yeah i don't It's a big, rare problem. I don't know if any of this is fit for ah public release, actually, but anyway, um it's it's not a huge problem. There's some optimizations that we've identified, but they're not the number one thing because we are an infrastructure at-heavy shop. um Our infrastructure cogs, cost of goods sold, are our number one category of cost of goods sold. And so any bit and every bit that we can optimize those and shrink our our infrastructure, um basically footprint, is money in the bank for all of us. you know on And so that is ah that is a huge, that is ah it's it's mostly infrastructure-focused.
00:13:22
Speaker
Okay, so the question is, what is the distribution percentage wise, between the cost of the infrastructure for you guys and data cost, data classroom cost?

Cost Management Strategies

00:13:32
Speaker
It's probably ah in the neighborhood of like 95.5. So it's almost it's it's almost all like we've got a pretty significant data footprint, but our infrastructure footprint is the vast majority of our cloud bills. I mean, that's that's common. And that's one of those things, especially if you're hosting for other people. Right. It may be that they're that they're what you're hosting for them. Some of that is data platform stuff, but it's not your internal data platform. It's that if you're paying the bills for it, this is that whole fun thing of if you're past versus SaaS. Right. If your platform has a service and you're it's not run in their own account, if it's running in your account, then, you know, that the numbers get. And so.
00:14:18
Speaker
So it was really interesting. I had a ah call with, uh, um, cringe Shlomo Goldenberg, and she was talking about they have a lot of data that's internal versus external. Right. And when they're showing that external stuff to people, they're like, this is your costs and this is your thing. And people were really, really on board with that. But when you're thinking about your internal data, not, not data platform, but even your internal data about what you're spending. you know ah When I was doing this as as a company that was going public, the finance team at first really, really cared because our cloud bills were going sky high. I came in and able managed to stabilize them, and they were just like, okay, Scott, we want to know where we're spending money and why, but
00:15:01
Speaker
yeah know I started giving them a weekly forecast update as to this month and the next three months and then kind of the rest of the year. And so you know I'd be like, hey, this project we had that we thought was going to roll out this week is actually going to roll out in three weeks. So our cloud bill this month is going to be well below expectations, but next month it's the exact same expectation from a dollar standpoint as it was. and like you know, that idea and then just allocating it into the cost buckets and having a thing where the auditors came and asked me, you know, like three questions and then ran away screaming. And we're like, don't ever have me talk to that person again. That guy knows way too much about auditing cloud bills.
00:15:35
Speaker
right and so like What have you found is the thing that people care

Translating Costs for Engineering Teams

00:15:40
Speaker
about the most? Is it managing the surprises? Is it managing the bills? is it like i mean do Do your customers, they pay a set amount and if you can cut your costs, that's 100% into your pocket or it's somewhat into theirs. or yeah like When they were asking me to cut costs, it was because they wanted to significantly lower the cost that we were able to offer things at, so we kept margins the same, but we were able to significantly grow that. so like How are you intersecting in the actual strategy rather than just being like our cost per query is this or our cost for this customer is this like how are you looking at ah taking your data background and intersecting with what they actually care about versus what they tell you they care about, which is just spend less money. ah Yeah, really good question. um sorry Like 19 part question. I have that habit.
00:16:28
Speaker
Yeah. Well, I always just pick the last part. You know what I mean? I can't walk back. You agreed to do so. Okay. Well, let me see if I can try and answer it this way. We own all of the costs of goods sold. Our customers pay us a dollar amount for the thing that they pay, and then We handle the hardware for them so they never like their their footprint. I mean it's connected to our costs because you know what we sell as the platform as a service that you buy essentially like a you know it's sort of like Heroku like we look a lot like Heroku on the surface. So you can buy number X number of dinos to expand your computing power and you can plug it into this and that you know and so.
00:17:07
Speaker
You do have control over the underlying hardware spend, but they never see it. you know so Effectively, any amount that we can optimize um the underlying hardware spend is is money in the bank for us. so It is highly connected, especially at this point. you know um We did our last raise in 2022. We got it right under the bell before they started raising interest rates. And then since they have raised it, you know started raising interest rates. um And that's how it's been for the last few years. like the you know but just the the and We've all seen it. The technology landscape in general has changed quite a lot. right And so optimization, like growth shifted to optimization. like
00:17:43
Speaker
very quickly um from the mouths of our investors anyway. you know and so um so it's It's highly visible and in that way. um the biggest sort of like but rather like one of the sort of the The way I view this job or have viewed this job for the last several quarters is that we get these cloud bills that ah that are the same basic shape from different vendors, but they're different. They're different vocabulary. you know Azure's bill has its terms and AWS bill has its terms for the same concepts and GCP and so on. And they're modeled differently. you know um And so that's problem number one. You have to sort of like yeah figure out a common language and merge you know be to be able to like kind of understand the value that you're getting per cloud you know across clouds.
00:18:27
Speaker
um We need this player for that Yeah, well, um I mean actually I'll plug this project that I'm involved with it's it's called the FinOps focus project Which stands for FinOps open cost and usage spec might be interesting to anybody who works at a multi cloud shop like us It's it's a spec that ah actually just went GA 1.0 only a couple of weeks ago We've been working on this for a a year and a half and just went 1.0 a couple weeks ago at the big FinOps conference in San Diego at the beginning of ah about a month ago And this is an attempt to ah kind of like reverse engineer a spec or ah um an interface out of all these different cloud bill implementations that all these different bills, you know, all these different vendors have presented. So that's that's like job number one is just to be able to compare
00:19:15
Speaker
you know, apples to apples and hopefully a common currency, you know, and then so there's that sort of vocabulary normalization. But then once you've done that, there's a vocabulary normalization that has to happen again, or a translation that has to happen again with your engineering team, because don't forget, your engineering team is the one who's actually spending the

Managing Financial Expectations with Real-time Data

00:19:30
Speaker
money now. And that's the interesting sort of like Genesis about FinOps became to be like, back in the 90s, before I was involved in technology, you know finance completely controlled development and engineering spin. like If you wanted to launch a new thing, you had to go and go through procurement and get a server and get it put in a colo center or put in a rack somewhere. That doesn't happen in the cloud anymore. like we you know Engineering, essentially, like they can spin company money with a click of a button in an interface and totally take it out of the hands of finance. and so That's why you know like the old ways of managing spin is broken down because the power to spin and control spin is now in the hands of engineering. and so like what you
00:20:07
Speaker
but they're not going to go look at the bills. They don't care about them. Some of them do, but most for the most part, like finance still controls the bills and the language and the vocabulary of the bill. and so like The part two of the vocabulary organization or translation is taking all this cloud vendor speak and turning it into the language that your engineering team speaks. um in terms of your infrastructure, your architecture, which applications are running, who owns them, what pieces of the architecture are are generating what costs, where are the surprises actually coming from in the language of your company and your engineering team, not in the language of EBS or, you know, ah ah you know like S3 or whatever, you know, like these different object store things across different vendors too much, you know, um and nobody really cares or thinks about it that way. So it's like,
00:20:50
Speaker
It's this giant dor data normalization project essentially to like boil out common understanding of what's actually costing money. And then once you do that, this is what I find anyway, or have found. Once you do that, if you just spend a ah modicum of time having sort of regular calls where we all look at the things together in the language of our company, um rather than the language of the vendor, then people start finding interesting things in the charts on their own. and An engineering optimization initiative just kind of spawned themselves, you know, um because it becomes really obvious once you translate from, oh, UBS snapshots are costing us a ton to, you know, our you know our backup regime for disaster recovery on our Ceph layer is like, it's pretty expensive. And if we balance if we rebalance our Ceph layer in a certain way, it causes that to spike. So let's don't do that. And then let's also think about how many snapshots and blah, blah, blah, you know,
00:21:38
Speaker
It starts to run itself and becomes a bit more like ah a performance optimization is really how it starts getting get looked at, which is a lot more fun than cost avoidance, you know, as a tournament board. Yeah, it's a beautiful way to look at it. I had my own like yeah four categories that I put things into. And one is, you know, um cost cut. And then there's cost containment slash avoided. But it's cost containment of like, hey, for our new stuff, like we're looking into it and we're looking at pre optimization and communicating that.
00:22:09
Speaker
and then um cost forecasting and cost allocation when it comes to, you know, customer, but but especially for just accounting purposes. And if you cover those four, especially the forecasting, you know, I i always said finance hates the the top three things finance hates. Surprises, surprises, surprises. Number four is spending too much money. They don't like spending too much money, but the number one, two, and three things, their job is to forecast. Their job is to tell you, are we spending money appropriately? are we putting this in in the right are are we Do we know what we're going to be spending so that we don't overspend?
00:22:47
Speaker
And so like the more that you can empathize with them, and exactly what you said, like go in there. you know We had ah spiraling S3 costs, and what we realized was we told everybody we had 18 months um ah retaining of your data. Oh. But it was- This ah yarn is all the time. ah No, it was far past that. It was literally- I know. You don't even have to finish the sentence. I know it's far past that. It happens to everyone. Right. yeah We didn't have appropriate, and and even from an engineering standpoint, we didn't have appropriate things in place to be able to safely delete that data at the time. And so you know we said we will our our language was we will do at least 18 months.
00:23:29
Speaker
And then, so we were, you know, covered, you know, CIA from a legal standpoint, but we started to say like, how do we delete this

Transparent Communication for Successful FinOps

00:23:36
Speaker
stuff out? We were looking at how would we delete out a bunch of this stuff from our elastic search clusters? It was like, well, it might take, you know, it might take our clusters down entirely to start deleting out this data. If we just try and click delete and delete out, you know, two years of back data. So this would take our entire service down for five to six days. So how do we balance what matters? And at the time, our chief product officer was very much focused on getting to parity with our downloadable software with the SaaS. So it was like, features are more important than costs right now. And we were able to quantify that. And we were able to get approval on that. So that communication of what matters and why, you do have to get that into people's language and be like,
00:24:20
Speaker
Hey, we could do this thing that's going to save us 10% of our costs, but it's going to take the next year and a half to implement. And that's all our engineers are going to focus on. And one, you know, we can't do any new features to our performance is going to suck. And three, all of our engineers are going to leave because that's not exciting or interesting. So like. how do we balance this and we're always going to be overspending, right? Like I found out all of like almost all of our queries weren't hitting our read replicas. They were only reading from them yeah you know the leader and the leader replica ah for our RDS instances.
00:24:54
Speaker
And it was like, OK, well, we could just shut off all these read replicas. And they're like, no, we know have we need to have um are we need to to rewrite our queries because we need to have these reading from the read replicas because it's just smarter to do that, even though we weren't using all that stuff ah very much. We were using like 4% of the computer or whatever.

Complexities of Cost Allocation in Container Environments

00:25:12
Speaker
But you know it was a real question of, hey, here is something that can save us Uh, you know, $500,000 a year and it can be done in, I can do it in the, in the terminal. If you give me the ability to chop it off from being a replica and then shut it down. Like you don't need to do it ah from an engineering standpoint, but is that what matters and and that conversation in FinOps.
00:25:34
Speaker
of not just reporting the what is, it's what should be and why. And again, when you said getting it in the language of everybody so that they can understand, why are we doing this? Does this matter or is this an effective spend of our money? That changes yeah the tone of the conversation instead of, I was surprised by the bill, therefore cut your bill by 20%. Yeah, exactly. um you know i mean If you can pose it as like as a performance optimization problem, then it's fun. you know what I mean like ah ah started off, performance optimization to some of us is like the funnest part of development. you know It's like, yay, I like building things, but what I like more than anything is once they're built and people start using them, making them faster in a smaller footprint. that's like That's really satisfying to me as an engineer. A thing that you said, like
00:26:22
Speaker
I don't want to miss the opportunity to give ah props and credit to the people who held the FinOps role that is in the finance team previous to me. There weren't like massive fires and obvious problems with the infrastructure and nobody had any idea what was going on. and That was not the case. Like we actually have run a FinOps program since I guess probably around the time, 2017, 18, and you were doing it by virtue of, you know, we've always been a cloud hosted, we're a cloud native company and therefore our cloud bills are the thing that can kill us if they get too big. So we've always had an eye on it. And we actually, ah you know, I inherited not a huge mess. The the can I mean, actually, a ah a fairly like, probably as good of scene as just about anybody inherited the put points of kind of, I guess, contention came up with, you know, this being a finance led effort and finance always has expectations from
00:27:16
Speaker
their bosses about what kind of reporting they need to deliver. And I was coming at this from a data perspective, understanding the reporting that needed to be delivered, but also understanding that the current status quo vis-a-vis our data was not able to support this, was not able to support the future, you know, like reporting workloads and understanding and visibility that we are going to need down the line. And so I had to, or rather I chose to, basically stopped delivering the things that finance was asking for. And that caused, in hindsight, you know ah not a great situation with ah with the relationship, but there was no other way to get past. like We had we kind of we'd climbed that mountain their way as far as we could possibly go, and we just couldn't go any further. and so we had to like
00:28:01
Speaker
Pull back and kind of start back over from from the scratch like doing things our way you know um and that's what I've been doing for the last year and a half really is Starting back at the bottom of the mountain and normalizing merging guys these build into one place so that instead of having three different You know dashboards for our three different vendors that you know are generating costs, we can just have one we can actually compare across and, you know, and then all the products that are downstream of that because you're still just in sort of step zero with being able to normalize cloud bills. There's all this, not just normalizing the language to our engineering, but we have like, you know, we're a containerized shop. And so there's this whole like.
00:28:36
Speaker
fabulously interesting cost allocation. I never would have thought cost allocation would be interesting, but it is the most interesting, most detailed work that it's like uses every bit of my experience that I've had here in the last eight years between data and infrastructure and architecture and talking to customers. you know We had this containerization platform that basically all of our customer workloads are are launched into this abstracted sort of alternate dimension that is the containerization you know framework. and And then they spend our money in this on this other side is this this you know opaque barrier like finance had no idea what to do with that like we just don't stuff into this in this bucket and then we kind of like make some guesses.

Challenges in Tagging and Cost Understanding

00:29:15
Speaker
The work that I've been doing for the last year part of it has been you know allocating out into.
00:29:19
Speaker
these hundreds of thousands of containers every day um that are constantly being created, destroyed, you know, like moving machines and like allocating costs, these things that are just like femoral and abstract and then merging them back together, like further downstream. And none of that could have happened without like a solid data base to build upon, you know. um So, yeah, it it has worked out in the long run, but I do own like sort of, I don't know, some of the, you know, some of the contention, like, yeah, I created, you know what I mean? but Like, I did that. Yulia, I want to hear your question. I have a question. there There's one quote that makes sense, which is, at the very start, what you said was, sometimes you don't want you don't get what you want, but you get what you need. And that was the rebuild of that. like that that you know You don't always get what you want. but And then, yes, the abstractions on abstractions of cloud, and then containers, and optimizing the things that are running in the containers, but also your settings for the containers. Because we had containers that were way over-optimized on memory, and we were like,
00:30:17
Speaker
And then optimizing how far up on your ah Kubernetes instances do you go where you go, like, do we go right up to the top where we're going to have CPU lock on all of these because everything might be going right at the the height of CPU? Yeah, it's it's a fun optimization on optimization on optimization challenge. that Yeah, exactly. yeah you you had You had a question. I do have a question. John, tell me, how do you see um like infrastructure, phenomps, and data phenomps? How much a different for you?
00:30:49
Speaker
Oh, good question. i I guess you have this data platform experience you've been in. Yeah. And it's different. And this is where I'm started. How how do you see different? I know it's not a big chunk for you guys. And you said that you're still waiting to tap in into it, but like, why when it's different, what is the challenge behind it?
00:31:17
Speaker
Gosh. um
00:31:20
Speaker
Probably what's different about it is just understanding the, I guess I'll say cost drivers to use the ah the industry term. Understanding the cost drivers of your data setup versus your infrastructure setup. our infrastructure set up in the instances that we launch, launch they don't, you know we don't use like a ah Lambda architecture or a you know a serverless architecture here. We spend servers on behalf of our customers. you know So those things are long lived. The storage that we, you know <unk> it's long lived versus we're a BigQuery shop you know and every BigQuery job lasts for however long it lasts and then it disappears and it's never happens again. you know And so understanding how to tag those individual jobs
00:32:05
Speaker
First of all, even having the capability to tag those on those individual jobs, you know? um Like Metabase, we use Metabase. Metabase so puts a couple of like, or maybe it doesn't. I actually don't even know if, I can't remember if Metabase actually tags the BigQuery jobs that it spawns. most cri again all I'm not sure it does, actually. I'm not sure it does. If it does, it does it doest with like attributes that I can't actually don't find meaningful. So go type GitHub meta-based BigQuery tagging, and you'll find the issue that I'm on there to like actually add custom tagging to be able to surface you know which users, which data sets, blah, blah, blah. you know
00:32:39
Speaker
um so it's like just because the world is the world is not mature, you know, yet, ah we're getting there, but it's, you know, and it'll come along eventually, but it's just not there yet. um Those of us who care about the front end functionality of meta base versus those of us who cost like um care about what meta base costs, like, it was a lot fewer of us, you know what I mean? um So there's that.

Critique of BigQuery Pricing Model

00:32:58
Speaker
Then there's just like understanding your tools well enough to be able to like tag jobs is really what you're talking about in in in BigQuery land. I'm not sure about you know Snowflake or Databricks because I've never worked with those directly. um But being able to tag jobs correctly with the the attributes that you need to be able to... you know then it's Then it's very similar. um But just understanding like the differences in spawning an instance versus spawning a query job um and what options you have to tag those query jobs and where that happens, that's usually a pretty technical
00:33:27
Speaker
I guess. this And also, you know, it's attacking itself doesn't like you can see the cost, but you don't necessarily see the problems with it because like, what if the jobs, you know, running too frequently or what if the jobs, um, you know transferring all the data, not doing the div. Just the one that you Scott was mentioning, like from API. Like what if the jobs were unoptimized? It's hard to see that on a job level. Because if you see, so BigQuery, BigQuery gives you the opportunity to um to look at the job level, okay?
00:34:11
Speaker
And this is what we talk and they give you, a ah ah they give you they they make it available to understand how much each job costs. But when we are talking about Metabase and how it returns straight data from Google BigQuery, it doesn't do it in a job. it It's a repeated job. And what you care is not a job level. You care about this repeated process, how much it costs, what is the frequency, who owns it, how it behaves. And this is, I think, a challenge, like for me when when I'm looking into it. um ah daytime you know so Yeah, but also what is interesting, you were very much perplexed with um slots, um ah the additions pricing model ah for Google because they announced last year and we had a nice chat about that. ah What do you think the problem is that? Like why you are not a fan of it?
00:35:00
Speaker
um So ah why I'm not a fan of it, and I think that's an accurate representation. I am not a fan of it. And the reason I'm not a fan of it is because they took an old, the the previous model we were on was the flat rate pricing, which I understand, that's very dumb. It's like provisioning, ah you know it doesn't invite optimization, which is what they're trying to invite and what I ultimately want, you know I want to be incentivized. So you know so that's fair. but it's also a pain in the butt because it has made our BigQuery costs variable again. you know um Storage is what it is is, but on the compute side of things, the flat rate pricing was you pay X grand a month and you get X number of slots and that's what you have and you cannot go over it and you know exactly what you're going to... The additions model is variable almost no matter what you do. And that is annoying to me, first of all, and because it makes things harder.
00:35:53
Speaker
Did you realize that you talk like a yeah ah person from finance? theater and Yeah, finance person right now. You want it to be predictable. That's what you're struggling with. Yeah, I mean, yeah, that's not. Okay, so increase side of you. Okay, you know there you go. Side of you are not a data person anymore, you are a finance person. i guess yeah i mean you know technically i am a I mean, I report to the CTO, but I get more of my margin orders from the CFO. I kind of have a very strange reporting relationship. So um so yeah, you know that's fair, sure. I mean, do data people want things to be unpredictable? I don't think so. you know
00:36:30
Speaker
um yeah yes They do they they they they like like chaos because it injects more interesting data to analyze. like That is one of the inherent problems of this, of they want to optimize and optimize and optimize, and they want to optimize so for things that aren't the other constituents what matter. You talked about forecasting. is for your finance constituent, and it's also for your technical constituents, so your finance and your technical, you know the CTO, isn it to have you the our our org was structured a little bit weirdly, but the engineering org was under the chief product officer. And the chief product officer told me he didn't want me to be hired. Not me specifically, but he was like, I don't want somebody in this role for another six months because I want to be able to spend without somebody coming in. And so I optimized for getting him the ability to spend what he needed
00:37:20
Speaker
But that we reported, so yeah, i I do think they like the chaos because it creates more interesting data. I really don't think data people want stuff to be predictable yeah yeah until they get to a senior level and they realize the pain of unpredictability that it causes their organization and their especially their line of business are part of the organization. I want you to defend data people right now. Because you want not cows. I think they want to innovate and task next and experiment more. And the freedom of being not limited in something and not thinking about cars too much gives them this ability and you know, mentally allows them to experiment and not to worry about the
00:38:03
Speaker
ah commits it made, you know, in front of the business. I'll put it this way from my own personal perspective, there is so much entropy in the system that I deal with, I can never get rid of it all. And so any amount, I love chaos, because I love bringing order to a little bit of the chaos for a little bit of the time. And then moving on to another thing, you know, and so any amount of like, just, I don't want to get away from the point, because I actually haven't finished my answer about why I don't like the additions model. um But any the amount of like order that I can bring to one little thing, it's just like it's satisfying to me, you know, because there's plenty more chaos out there to deal with. And I love it. I love being dropped off and, you know, on in the middle of an unknown situation with no map, you know, like, that's my that's one of my favorite things. um But let me finish the answer about why I don't like the additions model is because the additions model
00:38:50
Speaker
decouples the, you know, it it makes it variable. It makes it so that I have to carry ah care about essentially down to the level of like individual query performance. I have to care about individual query performance, but it does not allow me to directly connect individual queries to the costs they generate. So that is my problem because if I list out for the info schema jobs table, if I list out all the jobs and I tally up all of the total slot milliseconds that every

Promoting FinOps Practices Across the Company

00:39:16
Speaker
job has bought or used, um I get a number. And if I go over to the bill and I tally up all of the slot milliseconds that that I've been billed for, they correlate, but it's a radically different number. And the reason for that is because, uh, telling me that I've lost connection to you. Okay.
00:39:37
Speaker
um The reason for that disconnection is because of this autoscaler thing that they created. You're aware of the BigQuery autoscaler. It sounds great on paper, and it looks great in the marketing materials. Your capacity automatically scales up to your business's needs. I wrote a post about this on my blog, No, The Fin Operator, www.thefinoperators no, no, no, no. We all.com. Go check it out. Link listen and just in the show notes. super attend Yes, to it. please. ah Because right after editions came out last July, I wrote up a post about this, about our personal experience with it. And it just, it doesn't fit the marketing materials, surprise, surprise, no big deal, that's okay. But it really doesn't fit the marketing materials. And the auto scaler essentially represents the capacity that the capacity that you're being billed for.
00:40:21
Speaker
you know But your individual queries represent the capacity that you're actually using, and your capacity that you're provisioning, that you're kicking off to be provisioned for you, is always going to outscale that that you're actually using. And you can correlate the two and probably come up with some sort of cost allocation. You could figure out what to optimize, but if you're going to make it that I have to care about performance again at an individual query level, then you need to give me the information to be able to connect the individual queries to the precise number ah of millions of a penny that they actually ah cost my business. you know and so that's that's my I know this is coming. i don't i I love BigQuery. I love most of Google's products from a product's point of view. um you know but and but it's just Right now, it's it's annoying. We're in the annoying time where they've launched version one.
00:41:09
Speaker
And we haven't gotten to version two where the nice things, the the rest of the things that are obviously missing from version one are going to come out. you know So that's that's my beef. Hopefully. I see. No, it totally makes sense. And now um I can tell you that the loss of our customers because we're serving to Google Cloud users frustrated by the same thing, like there is no clear visibility. And for some reason, ah Google Cloud Google BigQuery billing doesn't allow you to dig in this deep.
00:41:41
Speaker
I don't think that for a reason it's just hard for them as well because it's a big corporation currently and developing new teachers take forever. And yeah, so it's a whole different discussion. But what's really interesting to me is that, okay, you are a senior director of FinOps practices and you have a lot of on your plate. Okay, even doing this internal semantic layer where you can map all the services you're using from different ah clouds, to make it translated that we can wrap our head around, that it's the same services from other cloud providers. But to optimize things, you are not rolling your sleeves and going optimize it. You have to onboard other people to make it happen. like And this is not necessarily your subordinates.
00:42:36
Speaker
and they need to be incentivized. like There is lots of people work involved. like Previously, you mentioned that you didn't enjoy the work of Evangelizing and Championing Data Platform, which seems to be easier for me because it's like, okay, this can help you do this while you're coming in as a datear um sorry as a senior director of FinOps and saying, okay, you see this. I don't know if you may do this, but you need to clear clean it up. like and the yeah I mean, how is it going? How's it going?
00:43:08
Speaker
yeah how go um you know like how's it going um mean Honestly, it could it could be going better because I am naturally drawn more to the building side of things than I am to the evangelizing of the things that I built side of things. it's like you know Companies have marketing teams precisely because the engineers are not drawn to marketing the things that they built. right and It's sort of like I am a ah team of two. I have one teammate who manages tons on her own, but she's also not a marketing professional. It's like, at some point, I need to stop building or be allowed to stop building for a while to move into it it's just a different side of my brain. I attempt to disengage and write about things and quantify because like if you build it and nobody knows about it, it doesn't matter. I understand that. But it's also very difficult for me to switch.
00:43:57
Speaker
from one to the other within the same day or even the same week a lot of the times. I'll spend entire weeks where all I do is write about the things that I built out in the previous week. I need to do more of that. but but um The finance team, I'm slightly a victim of our own success in that we've been able to do this what probably I hope at least to me feels very satisfying and almost dare I say a little bit magical framework where we've been able to allocate these costs through these alternate dimensions and bring them back together in ways that you know can get boiled up to our board um there's there's still more of that I'm not quite done with that yet you know what I mean it's so it's like there's the hosting costs and then there's other new stuff
00:44:40
Speaker
Yeah, it's fun

Collaboration Between Engineering and Finance

00:44:41
Speaker
work. I enjoy doing the work, you know but it's like at some point you need to stop and talk about the work or else nobody knows it's happening aside from the CFO and the PNA and the handful of people who would directly use it. But there's a whole other audience in my company for it. And that's our considerable engineering work. you know um And i've got I've got meetings with them that happen, but it would be really cool to just be able to sort of preach this performance optimization gospel more widely And some of this data is big and not just like boiled down for you. So you, you know, if you were going to walk into some of this cost and cost allocation data today, like you'd get lost instantly. You would bounce out of the table and you would never come back. You know what I mean? And so.
00:45:24
Speaker
There's just a never-ending process of education and evangelization. I could, and this is the feedback I routinely get from my bosses, be better at it. I don't need to be defensive, but it's a hell of a lot of work, man. how i get these things done I get these things done, and this is just, I guess I'm just sort of like sharing ah maybe too much, but like I get these things done by monotasking. you know what i mean i don't there There have been periods in my career when I'm in charge of like keeping 20 plates spinning and you know I'm capable of doing it, but this this work is much more much more satisfying to me to be able to focus on this problem of cloud costs and tie together all of these different things. and it's like you know I get those things done by focusing and shipping a giant thing.
00:46:13
Speaker
so ah You know, how's it would look like, for instance, if you would kick off this project, let's say optimizing Google BigQuery costs, how's it would look like? I mean, I don't believe you're going to go and optimize those ETLs for any processes. What I understand, you're going to assess and to understand the benchmark where you're starting and you will assess what we can do possibly, but who would be engaged to actually do the work? Because maybe you're doing the work, which I presume is not realistic.
00:46:44
Speaker
Yeah, no, I mean, I i i enjoy that work, but it's not my it's not my data platform anymore. You know, I'm a customer. You have to leave. I've built it and now I'm now now i'm outside of it. um And so, yeah, I mean, I don't know. I think it it all starts with just freaking dashboards that the execs can understand, right? like everything meaningful happens with a ah good dashboard that like the CTO or the CEO understands. like For real, because I can sit here and bang the drum from below like all day, and some people will get interested about that, but there's no motivation like the motivation that's but put there by our bosses. You do the benchmark presented to your CEO and CIFIO. Put it upstairs and then let them decide priorities.
00:47:22
Speaker
you know Okay, that's very interesting. This is an insight. what One thing that you've been saying that I think a lot of people don't expect when you talk about FinOps is that the finance team, there's always this idea that the finance team is super contentious. And once you read them in, they still want you to cut costs and they still want you to forecast better. But for my experience, especially, you know, I had the nicest FP&A team, you know, Chris and Jeff loved to working with you guys. um but The finance team engages far more on the FinOps. the the A lot of times the engineering team, their incentivization is to get this too good enough so they can focus on the things that they think matter. And so that like overall company incentivization as to say, does this matter to our company? you know When I first started, like I said, that the engineering team, that the chief product officer said, I didn't want anybody in this role.
00:48:17
Speaker
yeah i i went to my um onboarding for the company onboarding a little late.

Engagement in FinOps: Finance vs. Engineering

00:48:22
Speaker
You're supposed to do it in your first like two or three months. And I was like month five, but we had already stabilized our AWS cost. And when the CEO came in and everybody was going around saying, what do you do? And I'm like, I'm the cost manager for our cloud costs. And the CEO looked at me, no, the CEO went to, looked at me and went, great job on stabilizing the costs. I'm really, you know, I'm really happy with what you've done so far. And everybody's like looking at me like, what? And I'm like, it wasn't me. It was more everybody else team effort. It was the FBA guys, but like,
00:48:48
Speaker
There are people that are going to want to engage with you on this. But again, you you do have to figure out how this fits in. And costs are not the thing that drive most companies forward, right? youre right if you If you cut your costs down to zero, but that's because you've shut down everything that you're offering, your company goes out of business, right? like So like figuring out how that sits within the organization and what is the prioritization and being crisp and clear about this isn't the highest priority for the company and that that's okay, but that we're all on the same page is tough, but I found it works so much better to be like, let's have an open and honest discussion on this. Yeah, we we went from um provision DynamoDB, which had, you know,
00:49:35
Speaker
very little variability because you know we did have some auto scaling, but DynamoDB auto scaling, it would take 15 minutes to kick in and we'd have these spikes that would last two minutes that were just massive, but then it would kick in like well after it had already gone um too far past. Yulia's giving me a bit ah of crap in the chat as per usual. um But then we moved on demand and so it was much more variable, but our costs went from 550,000 a year to 50,000 a year. Right. because so much of our stuff was provisioned at like 10 and it would maybe spike up to 10 once a month but it was at like one right every 20 minutes most of the time right and it's like this is your provisioning per minute and it was like we're not using any of this right yeah turn the paint up
00:50:21
Speaker
and And so once they they released DynamoDB on demand, and it was just like, hey, now we don't have to think about provisioning, and it it saves us a bunch of money. Let's implement this. And it took some time in the code, but everybody was like, we're so much happier with this. And it was like, yes, like we spend a little bit of time, but we have a repeatable way to put this because we had um good raw reusable practices. There was only one that and like had created their own DynamoDB library, which caused all sorts of issues. um but like the more that you start to talk to people about why does this matter and what does it matter and how much does this matter? You know it's you guys are are both married. Successful relationships, it's like one of those things of, hey, this thing bothers me. Well, how much does it bother you? right like Hey, you know you chew with your mouth open. Okay. That's probably going to be something that's really, really a big pain for most people because
00:51:12
Speaker
You're going to be eating around people all the time, right? Like you're going to be doing versus like, Hey, when you do this one thing that happens, you know, twice a year, you do this thing that slightly annoys me. Okay. You don't really need to optimize that for your relationship to be better. Right? Like, and understanding how much does this pain, how much friction does this cause? is really, it's a difficult conversation because the finance people, again, are programmed to say, you've got to help me forecast better and cut costs. Those are the only two things. And then you start to go like, but how much does this matter? And why does it matter? Their immediate thing is like, you're trying to weasel more budget out of me. And you're like, no, I'm trying to understand you. And you make them feel seen and heard, and they lean into the conversation. At least that's what I always found. Have you have you found the same thing? Or when you're talking with other people that they're actually more friendly with their finance people than they are with their engineering people? or
00:52:02
Speaker
Well, first of all, I just want to point out that um taking this data-driven approach to marital conversations tends to have actually 100% of the time had the exact opposite effect than the one that I'm i' trying to,

Community Support and Open Collaboration

00:52:13
Speaker
you know. um Second of all, like, I don't know, that you were talking a minute ago about some things and I want i want to give a shout out to the FinOps Foundation because the FinOps Foundation is a thing that exists, the subunit of the Linux Foundation. um And it's the sort of the official structure around the FinOps community, which is the coolest open source you know it's not It's not an open source community that builds software so much, but it is an open source community as a you know subcomponent of the Linux Foundation, and it is the coolest open source community that I've been part of since the Drupal days back in you know a long time ago. and It's a really vibrant, awesome community that all these new people who kind of get dropped into this role, because I don't think any of us like intentionally wind up in FinOps. We all sort of like end up in FinOps.
00:52:53
Speaker
um Accidentally because we care about these things and then we ask the right questions and then one day we wake up and realize this is we do FinOps This is our job, you know And so there's this whole structure around it now that can help people once they wake up and realizes what they do Help them actually like move down the road further down the road um And the Linux fund out based Foundation does a lot of um talking about rather than avoiding costs or optimizing costs, improving the value of cloud is like the words that they use over and over and over. Because to your point, people don't necessarily care about avoiding costs or optimizing costs, but what you're really trying to look at is improving the value of the things that you're getting more value out of the things that you're spending money on.
00:53:33
Speaker
on Making more efficient. yeah Exactly, exactly. you know We're spending X and we're getting in out of it. Well, can we spend 0.9X and still get in out of it because then you've improved your improve your margins by 10%. It's those kinds of conversations that you know, to your final point, Scott, like having those conversations with engineering is, um is really like, it can be tricky, because there are more layers in between the final sort of like optimized number, which is dollars that lives over in finance and the things that they're building and working with. And so I mean, to me, like, that's what makes it so much fun. But it's also what makes it hard, you know, um the meetings that I have with engineering leadership, where we're talking about parts of the architecture and infrastructure that could be optimized, you know, a lot of times those things take a long time to ship.
00:54:18
Speaker
because the big money is in big parts of the architecture in optimizing rather large things. there's We don't have situations, and this is to the credit of the previous FinOps team here. you know We don't have these situations where we have massive overnight spikes because, well, I mean, maybe we do every now and then because of data transfer and bandwidth abuse via our, you know who knows, customers. But ah we don't have like the giant fires that can break out for a lot of like people that are, or companies rather, that are brand new to a FinOps practice. you know um Yeah, I don't know, several points mixed into one long paragraph. Sorry. No, no, you're great. And it actually makes a ton of sense to me. um I have the last question before we break up.
00:55:00
Speaker
I tend to think, maybe I'm looking for the validation right now, but anyways, so I tend to think that what makes a good data engineer into the great one is understanding and communicating the cost of the resources they're using and communicating it to their business stakeholders, explaining them what it's going to cost them, what it's going to cost, why the real-time data going to be slightly, just as slightly, more expensive than batch every 15 minutes or whatever, explaining the things um and mapping it to use cases, make data engineer or data person great. How do you feel about that? like Do you have this culture set up at your organization currently or like what do you do with it?
00:55:50
Speaker
We're getting there. I mean, again, it comes back to the original problem, which is that the bills live over in this one part where engineering is not incentivized to go look at. And then if they do get curious and go over there, they're greeted with this whole terminology and vocabulary. What do you think you are? They have permissions to see that. In Google Cloud, you don't have the permissions to see it, even with this. Yeah, that's that's precisely it. And so, you know, that's that's sort of like the foundational problem for all of these things. You know, it's just being able to service that visibility. And that's where that's where to me, this whole thing is really just like it's a very specialized and very complicated, but very um also kind of like really simple routine bi problem. You know what I mean? Yeah, like it's just reporting and getting people access to the data that they need when they need it, you know, um which is the, you know,
00:56:38
Speaker
the standard Kimball BI playbook, you know, it's like FinOps is that, you know, with some very specific use cases that can save your company a hell of a lot of money. You know what I mean? So I don't know. Yeah, I agree with you. However, there's this problem of gatekeeping access. Yeah. The vocabulary of the data is not in one that makes sense to any engineer, you know, or data engineer or anybody, or really. I mean, it is a build on the business. Like they don't necessarily understand how it's used, you know, maybe. kind of customers That's exactly what is so fun about it to me because it's like it's this specialized sort of data topic that ends up spanning the entire business. You know what I mean? And it's just like, it's so gratifying fun to like work on this one focus thing that has reaches that just don't end within the bounds of the company. You know, so. Okay, John, that was so cool.

Conclusion and Invitation to Connect

00:57:33
Speaker
I enjoyed the conversation and I feel like we can continue for hours. I just feel so much energized. Thank you so much for sharing. Please ah feel free to share, to connect with John over LinkedIn. I will drop his um and throw the link into the a description of the podcast. Yeah. Thank you so much. Thanks, Yulia. Thanks for inviting me to do this. This has been a ah ball and Scott, it's great to meet you. It was so cool to have you. Thank you for joining us. Thank you. um later night
00:58:29
Speaker
I'll be later.