Introduction by Tom Moore
00:00:00
Speaker
Hello basement programmers and welcome to the basementprogrammer.com podcast. My name is Tom Moore and I'm a developer advocate working for Amazon Web Services. The opinions expressed in this podcast are my own and should not be assumed to be the opinions of my employer or any other organizations I may be associated with. Also, basementprogrammer.com is not affiliated with Amazon in any way.
Focus on Cloud Cost Optimization
00:00:24
Speaker
This month's episode is focused on a topic that may not seem glamorous, but certainly is important to every customer I've ever dealt with. This is the optimizing the cost and reducing the bill at the end of the month. For a lot of customers, this is one of the goals of adopting cloud-based technologies. It's to reduce the total cost of IT expenditure.
Transitioning to Cloud: Cost Implications
00:00:48
Speaker
Now, you can absolutely achieve a good deal of cost savings and cost reductions by moving to the cloud.
00:00:54
Speaker
if you understand how to make the most out of cloud. This does require some changes in the way you think about procurement of IT resources and the cost of those resources. The rules change when you move to the cloud. Knowing and really understanding those changes and using them to your advantage will help boost your level of success. The first mistake that people make is basing their comparison on what they view as an apples to apples comparison of cost.
00:01:23
Speaker
And I say this is a mistake because there really is no such thing. First up, purchasing in a data center, you're typically purchasing hardware on a refresh cycle. Now, in theory, this is usually three years. In practice, I see a lot of customers pushing this out to be five years. At some point, a very common practice is to then shift that old hardware somewhere in that three to five year mark off to a DR facility or dev test.
00:01:51
Speaker
This creates a false sense of the cost associated with those servers.
Common Mistakes in Cost Comparison
00:01:56
Speaker
So a customer will go out to the supplier's webpage, or maybe send a list of specs out to the sales rep, and they'll get a price back. Just picking something at random here, I'm looking around on the web, I see a 2-proc, 24-core server, 384 gig of RAM, and I get a price in the area of about $40,000, OS included.
00:02:17
Speaker
They'll then divide up that cost by the number of hours in five years, so $43,800, and arrive at a cost of about $4 per hour. They will then use this price as a way of comparing the cost of operating in the cloud. Now right away we have some issues with the cost of procurement. When you go out and you buy the hardware from the vendor, you're buying just the hardware, and possibly the OS licenses as well, depending on your agreements in place.
00:02:46
Speaker
When you rent an instance from AWS, you aren't getting just the hardware, you're getting the data center along with it. That means that that hourly cost includes the power, the cooling, the lighting, the rack space, et cetera. You're also getting all the physical maintenance that goes along with that server. If there's a bit of hardware fail, don't worry. That's covered as long as you're renting the server from AWS. If something fails, the hardware gets replaced. All of those costs are included in the hourly rate.
00:03:17
Speaker
And of course, security is job number one. So you get that world-class security designed to meet the needs of the most security conscious customers built in. The level of security that goes into the AWS data centers far exceeds the level it would be practical to implement for the average or even some very large customers.
Leveraging AWS Expertise for Cost Savings
00:03:35
Speaker
When you really start breaking down that true cost of operating in the data center, customers will see that the cost differences really start to shrink.
00:03:45
Speaker
I'm not going to delve too much into this topic, but one of the other pitfalls I see is when it comes to data center cost estimates is the idea of writing off costs that seem to be sunk costs. These are statements like, well, we already have that space anyway, so it's free. It's not free. All right, so let's dive in here and look at how to drive that cost way down.
00:04:11
Speaker
Now, all of the advice here is based off of a real world experience, and it's been refined by spending most of the last six years helping customers to find out the most cost effective way to run an AWS. Now, point number one, if you have a dedicated solutions architect assigned to your account, they should be your best friend. Don't even think about kicking off a workload without running it past them.
00:04:34
Speaker
Now, why do I say that? And full disclosure, my job for most of my time at AWS has been a solutions architect. Well, the solution architect gets paid by AWS, but they work for you. Seriously, one of the goals of solutions architect is to help you reduce your costs. Now, whenever I would have a first meeting with a customer, one of the things I would say is, Amazon pays me to make sure you pay us as little as possible.
00:05:01
Speaker
Now I won't lie, that message has often been met with skepticism initially. But in every case, I've backed it up with some pretty substantial cost savings for my customers. The Solutions Architect's job is to make sure you're getting value for every dollar that you spend. We don't have sales quotas to make. We don't get incentivized by your spend level. There's absolutely nothing with dollars attached to our comp structure.
00:05:28
Speaker
Actually, that's slightly untrue. When it comes to the year end and we're talking about our achievements, we brag about how much money we saved our customers. Now, a solutions architect can't affect your account in any way. All we can do is provide you guidance and advice. So, of course, you're free to ignore anything that we tell you.
00:05:46
Speaker
But when it comes to costs, you're going to see the projection, what the projections are. You're going to see the math. It doesn't cost you anything to get that second set of eyes. And if you don't agree, challenge our assumptions, respectfully, of course. Our motivation really is to help you. Okay. Now I did say the solutions architect should be your best friend. If you have enterprise support with AWS and you have a technical account manager, then the solutions architect to be your second best friend. The TAM should be your first.
00:06:16
Speaker
And I say that because the TAMs actually, their job is to actually look at what you're doing and make even further recommendations
Optimizing Cloud Resource Utilization
00:06:24
Speaker
and help you drive down those costs. Now, point number two, discounts. I will tell you this from experience, the most effective discounts that you will get running in AWS are available to everybody. They are simple to implement and they require absolutely no negotiation. So now here's where things start to get interesting.
00:06:47
Speaker
The first step of the way for cost optimization is getting the right instance. And that has two metrics, the right instance family and the right size. Now, every customer that I have ever dealt with runs excess capacity on almost every server. For some customers, this is pretty lean and they might be running at 80% utilization. For others, I see numbers commonly in the 20 to 30% for CPU and memory utilization.
00:07:17
Speaker
And I get it. You're buying a server for the next five years. You need to buy a server that will handle the most aggressive capacity predictions that you could possibly imagine for the next five years. So people figure out the max load they can expect and they add a buffer and that's what they end up buying.
00:07:36
Speaker
I mean, going back and asking for more money after a year really doesn't look good. And if you're replacing a server after a year or two because it wasn't big enough, well, now you've got a secondhand server you've got to find a home for. Even with virtualization, the idea of changing the resource profile for a server can be daunting. What's the impact of allocating more resources to one server? Are you going to impact other servers running on that same instance? Do you even have spare capacity?
00:08:05
Speaker
As a result, customers over-allocate resources on every server so that they don't get into these situations. Now operating in a cloud environment makes this much easier. Changing the resource profile is as simple as rebooting the instance. Also, because EC2 instances get dedicated to CPU and RAM, there's no risk that changing the resource profile of one instance affects the performance of your other servers.
00:08:35
Speaker
Now, the first step we want to address with getting the right instance is the instance family. You need to understand what instances, what software is running on the server and where it gets hungry. Are we doing a compute intensive task? In that case, we may want to opt for an instance like a C5 that has more CPU to memory ratio.
00:09:01
Speaker
On the other hand, if we're running something that's heavily memory dependent, we might want to opt for, say, an R5 instance that has more memory and less CPU. Getting the right family will allow you to further tune your resource mix and right size. Now, if you don't have any idea about the resource mix that's going to be required, I recommend starting out with an M family instance. This has got a relatively even mix between the CPU and memory.
00:09:32
Speaker
Then you want to monitor your instance as the workload runs. If you find that CPU is constantly high and very little memory pressure, you can swap to a C family. If you're running out of memory, but your CPU is sitting low, you can opt for the R family. If your disk is getting constantly masked out, look at something with instance storage. Getting that right instance type will put you in a good spot for the next step.
00:10:01
Speaker
Now step two is right sizing the instance. Look at the utilization for your instance that you're running and see where it sits. If the CPU and RAM are constantly sub 50%, you should drop that instance size down. Dropping an instance from say a 4XL to a 2XL will cut your cost in half. Keep in mind that we don't need to plan capacity out for five years or more.
00:10:28
Speaker
If our needs change, we can change the profile of the instance with a few clicks in the console. So while you might want to run a small amount of headspace for performance, you still want to keep that utilization high. Now sometimes we have spiky workloads, something that needs a little bit of power most of the time, and at certain points that it needs some extra capacity. In these cases, maybe we can have a second more profile and instance to do the work that we run on those special needs times.
00:10:58
Speaker
Maybe we can swap the hardware profile for the duration. If the software supports the idea of running in parallel, maybe we can add an extra instance for the period. These are all options that we can consider. As I said before, your solutions architect should be your best friend, and they can help you figure out the best way to address those specific issues. Now, the right size and the right family will save you a lot of your AWS bill.
00:11:24
Speaker
For many customers that I've worked with, this is a single biggest savings and it's not uncommon for the projected costs to drop 40 to 50% just by right sizing alone.
00:11:35
Speaker
Now, if all of this sounds a bit complicated, don't worry. There are ways that can help make this easy. The best way is to speak with AWS and get an optimization and licensing assessment or OLA done for you. This is a combination of software and services. As a collector that monitors the utilization of your machines, then it creates some recommendations for you based off the resource profile and utilization.
00:12:02
Speaker
The output is going to give you data on what your AWS footprint should look like, and it's generally pretty accurate. Now, there may be some edge cases where the recommendations need to be tweaked here and there, but it's a great starting point. Search for the AWS optimization and licensing assessment for details on the program.
00:12:25
Speaker
Now, if you don't want to dive directly into a full assessment, your hypervisor tools can probably provide you with a summary of your infrastructure and the average and peak utilization. That's the second option. The OLA, quite frankly, is going to be far superior. Now, I have regularly seen customers reduce their expected bills in the area of 50 to 60% with an OLA done, and some customers even higher.
00:12:50
Speaker
Alright, so now we have our instances all the right family and right size. How do we cut the bill down even further?
AWS Savings Plans and Reserved Instances
00:12:58
Speaker
The next thing we want to understand is the actual time the instances are regularly being used. Instances are going to fall into one of two categories. Either stuff that runs more or less 24 by 7, or stuff that really only runs at specific times.
00:13:17
Speaker
In a data center, everything just stays on all the time because the cost of that machine is the same, whether it's on or off. But once again, cloud is different. First, let's have a look at the stuff that runs sporadically. Do you have an instance that's used to do some overnight data processing? Do you have some development machines that are really only used during business hours?
00:13:42
Speaker
For those sorts of machines, consider building a schedule that automatically turns them on and off at specified times. When an EC2 instance is turned off, all you pay for is the underlying storage for that instance. This can mean substantial savings. Look, there are 168 hours in a week. Let's say the people using your test network work Monday through Friday between 8am and 8pm.
00:14:08
Speaker
That's 60 hours. If those instances get turned off on weekends and during the off hours, your savings are about 65% of the cost of those instances. Now take this a step further. Let's say you have multiple test networks, but only one gets used at any particular time. Well, now that 68% savings can easily jump to 90% or better by keeping the test network shut down, except for the ones you are specifically using
00:14:38
Speaker
at the times of being used. Now, this does take a little bit of time to get used to. I totally understand that. But by adopting these habits, there are substantial savings to be had. Now, for the instances that you have that need to be up and running most of the time, so things like your email server, your production database, stuff like that, there are two pricing constructs that you should be aware of. The first is an AWS savings plan. And the second are reserved instances.
00:15:09
Speaker
These are both constructs where you get substantial discounts in exchange for utilization commitment. Now let's say I have a database server. It's running an R5 8XL instance. That machine runs 24 hours a day, seven days a week, and it's going to be running for the next three years. If I use the option to cover this instance with what's called a reserved instance, I can save 33%.
00:15:35
Speaker
That's a pretty good discount. And I can do this completely self-service in the console. Now the other option I have is something called a savings plan. Now that same R5 8XL instance can be covered by a savings plan and I'll get a savings of about 28%. So what's the difference? Well the reserved instance ties you to a specific instance type and size in a specific region.
00:16:03
Speaker
Whereas the savings plans, you're only tied to a specific amount of spend. So the savings plan is more flexible if your needs changed. Now, both reserved instances and savings plans are going to vary based on the regions and the resources that are covered by them. You want to do your homework and do some calculations before you jump into either one. What I tell every customer is this, however.
00:16:29
Speaker
don't rush into the idea of reserved instances or savings plans. Get your infrastructure into AWS first and make sure you have the right instance family and size. Give it a few months because you're likely to find that you need smaller instances in AWS than you actually did on premises. Once you've got those details, then look at your cost optimization elements
00:16:56
Speaker
in reference to reserved instances or savings plans. Now here's another trick that you need to be aware of to fully maximize your savings. Both reserved instances and savings plans are purely billing constructs. There's nothing that ties either option to a specific EC2 instance. What this means is this. Let's say you have 10 EC2 instances, but at any one time only five of them are active.
00:17:26
Speaker
If they all use the same instance type and family, and you shut down the instances when they're not in use, you can effectively cover all 10 instances with five reserved instances. So how does that work, right? Well, what happens is this. Every billing segment, AWS is a look at what's running in your account. Then they compare the reserved instances you have and apply the discount.
00:17:51
Speaker
So, the five instances that are running have the five reserved instance savings applied to them. The five instances that are turned off, you don't get billed for the compute. If you do happen to have a couple of hours where there are, you know, a sixth or seventh instance that get turned on, you simply pay the on-demand costs for this couple of extra during that time. So in this case,
00:18:16
Speaker
By turning off the instances when not in use, and covering the running capacity with the reserved instances, that 33% savings turns into a 78% savings. Okay, so every time I start talking about reserved instances and savings plans, I get this question. How much work is it to convert an instance over to a reserved instance or a savings plan? The answer is none. There is no effort. It just happens.
00:18:43
Speaker
As I mentioned, these are purely billing constructs. Once you have an RI or a savings plan in place, it just kicks in. Now, of course, there are some drawbacks to both reserved instances and savings plans. In the case of a reserved instance, you're paying for the instance whether it's running or not. So if you don't have an instance running that matches the size and type of the reservation, you're going to be wasting that money.
00:19:12
Speaker
With savings plans, you're committing to spending a certain amount of money. So if your spending falls below that threshold, you'll be wasting that money there too. This is why my previous advice about making sure to let things settle in before applying either an RI or a savings plan applies. Okay, let's talk about storage.
Cost-Effective Use of AWS Services
00:19:34
Speaker
File servers, they're everywhere. I've never worked for a company that didn't have file servers sitting around. Sometimes they're well managed, other times they end up like that drawer where I keep the miscellaneous cables for devices like my Palm Pilot 5. These file servers end up storing all kinds of stuff. Some of it's important, some of it once was important,
00:19:56
Speaker
but hasn't been looked at in years, like my PalmPilot cable. And some of it is just junk pictures of cats off the internet that somebody downloaded and mailed out to everybody in the company. In this case, a managed service such as Amazon FSX for Windows File Server can be a huge savings over building a file server on an EC2 instance. It's really just going to be sitting there drastically underutilized.
00:20:22
Speaker
Now, if this is truly just a file server and maybe performance isn't terribly important, you can leverage FSx for Windows File Server with magnetic drives in a single availability zone and you're talking 1.3 cents per gig of space. That's going to be substantially cheaper than spinning up an EC2 instance in this case.
00:20:43
Speaker
Of course, you'll want to consider availability and performance factors, but pick the right options for the specific use case. Don't engineer something that's just going to be holding some user files. And of course, you can have multiple FSX file systems with different characteristics for different workloads. Okay.
00:21:05
Speaker
All of these figures are very much situationally dependent. The actual results will vary based on workload, the regions, et cetera, but none of these require any negotiation on your part and they're all super easy to implement. You could implement them today if you wanted to.
00:21:21
Speaker
Now, another thing you really need to look at in your account is something called Trusted Advisor. This is a service that runs on your account and it will actively tell you if things are underutilized and you should consider either turning them off or downsizing them to save money. If you've never looked at Trusted Advisor, go into the console and have a look. One stipulation here,
00:21:43
Speaker
To get the most out of trusted advisory, you need to have business level support enabled in your account. But if you're running mission critical workloads in AWS, you really should have business support on anyway. Now there's one final thing I want to discuss when it comes to optimizing for cost.
00:21:59
Speaker
Sometimes we have jobs where the timing isn't terribly critical.
Benefits of AWS Spot Instances
00:22:03
Speaker
You know, maybe we have a large data processing job, for example, that we know we need the data to be crunched. We know we need it sometime in the next couple days, but when isn't terribly critical.
00:22:16
Speaker
is a special class of compute in AWS called spot. Spot is basically the capacity that nobody else is using and it's offered at substantially discounted rates. Now the actual rates fluctuate depending on the capacity that's available, et cetera. But I've routinely seen customers saving 70% and greater. Now the one catch, you have to be able to engineer the tasks that you're running on those instance so that they can be interrupted.
00:22:44
Speaker
Spot instances can be taken away if the capacity is needed elsewhere. So spot really excels when you have a queue of work to process. The data processing can take a unit of work from the queue, process it, and then store the results, then grab the next unit of work. If the process gets interrupted, you simply pick up where you left off when things come back online.
00:23:09
Speaker
Now, all the other advice I gave was extremely simple. Spot does take some effort to be effective. You need to build that ability for the workload to survive being interrupted. However, if you can make those changes, you can make some substantial savings with Spot. Okay.
Conclusion and Listener Interaction
00:23:28
Speaker
So with that, I'm going to wrap up this episode of the Basement Programmer podcast. I hope you enjoyed it and got some value. If you did, please consider subscribing. As always, if you have any comments about the podcast,
00:23:39
Speaker
or any suggestions about topics you'd like me to cover, please feel free to email me. The address is Tom, that's T-O-M, at basementprogrammer.com. I'm always happy and eager to receive any feedback. Thanks for listening, and I'll catch you next month.