Become a Creator today!Start creating today - Share your story with the world!

00:00:00

00:00:01

“We Should Be Able to Drain an AZ” | Ep. 10

21 Plays7 months ago

In this episode, Cooper Bethea, a senior staff engineer at Slack, shares his journey of transforming Slack's architecture to a cellular model, allowing for more resilient operations.

Cooper recounts the frustrations of dealing with outages and the decision-making process that led to the migration to a cellular architecture. He explains how the initial struggles with service discovery and load balancing prompted a reevaluation of their infrastructure.

The key insight was that draining an AZ should not be a rare, high-stakes event but rather a routine operation that could be executed with confidence. Cooper discusses the importance of incremental changes and how they were able to practice draining traffic from AZs during peaceful times, ultimately leading to a more robust system.

This episode is a must-listen for anyone interested in infrastructure, reliability, and the challenges of scaling systems in a cloud environment.

-----

Get Tern Stories in your inbox: https://tern.sh/youtube

Connect with Cooper ➡️ https://www.linkedin.com/in/cooper-bethea-521936201/

Recommended

The DARK SIDE of Code Migration | Apollo GraphQL CEO Matt DeBergalis

Tern Stories

00:52:33·3 months ago

Inside Snapchat’s BOLD Code Migration: Faster & Leaner Rebuild

Tern Stories

00:49:01·3 months ago

Surviving High Stakes Code Migration Without Breaking Everything

Tern Stories

00:50:00·3 months ago

Code Migration Secrets: How to Finish in Half the Time with AI

Tern Stories

00:30:11·4 months ago

The Twitter Code Migration Disaster That Nearly BROKE IT

Tern Stories

01:05:19·4 months ago

Slack’s Code Migration Uncovered a Terrifying Truth

Tern Stories

01:08:23·5 months ago

You have to decide

Tern Stories

00:21:16·5 months ago

You have to decide

Tern Stories

00:21:16·5 months ago

How They Cut Code Migration Time Without Sacrificing Quality

Tern Stories

00:52:32·6 months ago

The iOS Developer Who Picked Nomad Over Kubernetes | Ep. 14

Tern Stories

00:51:46·6 months ago

IBM Killed Our Database: How 5 Engineers Migrated to Postgres | Ep. 13

Tern Stories

00:54:30·6 months ago

Wrong Tool, Right Choice: PagerDuty's Cassandra Queue | Ep. 12

Tern Stories

01:16:33·6 months ago

Rebuilding a YC Real Estate Tech Stack from the Ground Up | Ep. 11

Tern Stories

00:52:36·6 months ago

Slack's 6am Database Club | Ep. 9

Tern Stories

01:09:26·7 months ago

Why Every Code Migration Feels Different (and What to Do About It) | Ep. 8

Tern Stories

00:58:48·7 months ago

Migrating Memcache in a time of DEMAND | Ep. 07

Tern Stories

01:28:00·8 months ago

Ratcheting Progress: How Lyft Migrated 150+ Services from Python 2 to 3 | Ep. 6

Tern Stories

00:47:13·8 months ago

What Litigation Teaches Us About Security Operations | Ep. 5

Tern Stories

00:52:45·8 months ago

Outscaling ElasticSearch at Datadog | Ep. 4

Tern Stories

00:52:45·8 months ago

Upgrading Postgres: 5 Versions Behind, 4 Databases to Merge | Ep. 03

Tern Stories

00:47:16·8 months ago

Transcript