Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Code Migration Secrets: How to Finish in Half the Time with AI image

Code Migration Secrets: How to Finish in Half the Time with AI

Tern Stories
Avatar
60 Plays7 months ago

Code migration isn’t hard because the code is complex, it’s hard because the real problems only reveal themselves once you’re already in the middle of it.

In this episode, I break down why code migration projects so often drag on for months or years, and how AI can change that story completely.

From Stripe and Pinterest to Airbnb, Zapier, Twitter, and Slack, you’ll see how teams faced huge migration challenges, and how better planning, faster learning, and smarter communication turned the tide.

I’ll share lessons on why every code migration is unique, how small overlooked details can derail entire timelines, and how AI can uncover those issues before they blow up.

If you want your next code migration to feel less like chaos and more like a smooth, predictable process, this episode will give you the roadmap.

Get Tern Stories in your inbox: https://tern.sh/youtube

Recommended
Transcript

Chaos in Code Migration

00:00:00
Speaker
Mind creating is not actually hard because the code is hard. It's hard because you don't know what you have to do until you've basically finished it. At 50 engineers, the whole thing descends into chaos and you take months or years to finish anything of even moderate ambition. By the time they actually decided to hit merge, they still had 33,000 at TS expect error tags in their code base. 33,000 places where they knew TypeScript was going to throw an error and they decided to ship anyway at that point. Migrations aren't slow because the code is hard. They're hard because learning happens while you're already doing them.

The Challenge of Code Migrations

00:00:36
Speaker
Welcome to Turn Stories. I'm your host, T.R. Jordan. Today, we're going to do a deep dive on one of the most counterintuitive facts about migrations. And that's that migrating is not actually hard because the code is hard.
00:00:49
Speaker
It's hard because you don't know what you have to do until you've basically finished it. And that's not a big deal if you're a solo developer and you have a small project. You just adjust.
00:01:01
Speaker
You learn as you go. no big deal. if Even if you've got a team of three people, drop a note in Slack. Everyone adjusts. It'll be fine. But as you start to work on bigger teams, you work on more complicated migrations, a team of 10 might need a meeting, a longer email, a note in the wiki about how you're doing things.
00:01:19
Speaker
And at 50 engineers, I don't know. I've never seen it done particularly well. And most of the time what happens is that information gets left out, people get left behind, the whole thing descends into chaos and you take months or years to finish anything of even moderate ambition.
00:01:38
Speaker
And that's challenging. And we talk a lot on this channel about how people have addressed that in the past. how What are some of the ways that some of the companies that you've heard of, like Datadog and Slack and Twitter, have have addressed those challenges?
00:01:52
Speaker
But you know what? It's 2025.

AI's Role in Streamlining Migrations

00:01:54
Speaker
We've got some new tools in our tool belt. So today, i want to walk through three real migration stories, but take a different lens to them. I want to show you how AI could have helped those teams learn faster, share faster, and finish their migrations faster.
00:02:09
Speaker
So let's go. All right. For the first chunk of this, let's talk about how AI can help you learn more upfront to accelerate a migration.
00:02:21
Speaker
Because so much of the effort of doing the migration is learning how it should go and what should be done. There are ways to make sure that you can you can learn this. And I want to take as an example here the flow to TypeScript migration.
00:02:37
Speaker
if you're not familiar, Flow and TypeScript are both typing engines for JavaScript, JavaScript not having types of its own. And static typing is having a bit of a moment in the last couple of years. It's incredibly powerful. it just happens to be a great ah assistant for AI coding agents.
00:02:53
Speaker
um But even before this, the companies that decided they wanted to see the future and they wanted to add typing to their languages, companies like Stripe and Pinterest and Airbnb and Zapier,
00:03:07
Speaker
They had a choice to make because in 2015, there were multiple options for JavaScript. Flow seemed like a great choice. It had stronger React integration, had some great momentum behind it.
00:03:18
Speaker
TypeScript still seemed pretty ah immature at the time. Turns out all those companies turn picked Flow and they were just 100% wrong about who was going to win. TypeScript is the clear winner here.
00:03:29
Speaker
And they were all stuck with a choice. How do we get off of Flow? When do we move to TypeScript? The question wasn't where they're going. It's when they were going to do it, how they were going to do it.
00:03:41
Speaker
Each of these companies has done a great write-up on their experience here, and they all took roughly the same approach. There is an open source library out there, codemod effectively, that will migrate you in one big jump from flow to TypeScript.
00:03:56
Speaker
um Normally, that would be a little bit scary because incremental migrations are safer and more predictable. But in this case, having flow and TypeScript live side by side in the same code base is a nightmare.
00:04:08
Speaker
It's just not something you want to do. So the workflow is you run the code mod. You see how much it how well it works. You test it.
00:04:18
Speaker
You make the fixes. You reset everything. Commit your changes back to the open source library and run it again. If you look at the write-ups from each of these companies, Pinterest took months to do this. Stripe took a year before they were confident in making this change. Same thing with Airbnb, same thing with Zapier.
00:04:35
Speaker
And there was, in this story, you can see exactly the problem of they were learning how to do the migration because they didn't know when they were going to be confident in their ability to actually make the changes.
00:04:50
Speaker
And there's this is the the first lesson I want to take from this, is that it doesn't make sense to have all of these companies working against an open source language, and yet it's all their own projects.
00:05:02
Speaker
The reason that happened is because in every one of those cases, Their migration was unique. Their code base had unique challenges, unique usages of flow that needed to be fixed in that open source library before they could use it.
00:05:18
Speaker
And yeah, 80% of the work was reusable. But what I take away from this, and what I think you should too, is that if you're going to use any given tool, you're going to have to make it work for your code base.
00:05:30
Speaker
And that's going to be a huge part of what actually drives your migration time.

Addressing Code Inefficiencies with AI

00:05:36
Speaker
If you're human, this is frustrating. Your time is limited. And by the time you get into actually making the changes, you don't want to go back and replan or rework things.
00:05:45
Speaker
AI can chase a lot more dead ends than you. Setting ai a AI off to go figure out what needs to be done and how it can be done is a huge uplift because it can get ahead of you and tell you what the lay of the land is before you actually get there.
00:06:00
Speaker
So I'll give you an example. Within um Flow and TypeScript, there is a relatively simple um change that you can ah that's modern JavaScript has introduced. Anonymous functions defined in line.
00:06:16
Speaker
um if you If you want an anonymous function to return an object, you simply wrap it curly braces. Easy. um If you want to wrap a TypeScript function in curly braces,
00:06:27
Speaker
um the You need to make sure that you are disambiguating between returning an object and returning a type in front of an object. So in TypeScript, you need to wrap any typed returns without an explicit return statement in parentheses.
00:06:43
Speaker
Codemod didn't handle this case. Stripe had to go and fix that deliberately because they were extensive users of a pattern where they wanted typed returns from anonymous objects. It's not the most common pattern in the world.
00:06:54
Speaker
It only has to be fixed once, but it's extremely likely that within your code base, you're going to run into those kind of edge cases. And if you can find a way to leverage AI to go read your entire code base, read the entire changelog, the entire flow definition, the entire TypeScript definition, you're not going to turn up every single one of these weird edge cases, but you'll turn up the ones that matter for you.
00:07:20
Speaker
So that's the first first lesson here. What else can you learn before you actually get into making these changes? The second one in the case that I noticed as I read through these ah read through these these stories is that there is the developer experience to consider.
00:07:37
Speaker
Simply making the change, in the case of Stripe, everyone woke up one Monday morning and their code base was suddenly in TypeScript, which is kind of magical. But now everyone has to know TypeScript.
00:07:48
Speaker
So at Pinterest, for instance, they ran developer experience sessions. They had people who would show up and learn TypeScript. And this echoes what we've talked about on Turn Stories as well. Hal talked about moving from CoffeeScripts to React.
00:08:02
Speaker
They had to go teach everyone how to use React first. So even if your technology is well-known and you're moving to the latest and greatest, there is a new set of syntax you have to learn. And every single one of these migrations, they had to consider, how do we make sure that people are comfortable with the new technology?
00:08:20
Speaker
and you can find again this is something that you can use ai to go find what may be problematic what they found across these migrations is for instance the flow ecosystem is stable stagnant maybe um the idea is within typescript and one of the big reasons for moving to typescript is there is a vibrant third-party ecosystem but of course Having ah vibrant third-party ecosystem, especially in JavaScript land, means that there's a lot of churn.
00:08:51
Speaker
So people had to learn to update their dependencies, run a Yarn install or an NPM install when they were debugging. There was also explicit differences between the type engines. Flow has exactness as a default, but TypeScript pushes much more strongly for gradual typing, so their defaults are relatively untyped.
00:09:09
Speaker
These kind of quirks mean that people will need to learn how to change their own workflows And as part of any major migration, you, as the person who cares about this, are going to have to make sure that everyone else is successful with the new technology.
00:09:24
Speaker
And having a list of these things up front, well-documented, well-explained, with ways to learn more, is something that, yeah, you could do, but it takes away from your ability to actually go do the technical work itself.
00:09:37
Speaker
This is a huge area for AI to go dig up those details and those edge cases and share them with with everyone else. The plan really does depend on what you learn at the beginning, and it'll stay more stable, and you can teach people and educate your coworkers if you can front load the plan.
00:09:56
Speaker
um There's a really specific example here within Stripe's case that I found interesting. They did not fix all of their types, obviously, in one big go.
00:10:07
Speaker
They didn't add perfect perfect strong

Strategies in Shipping with Errors

00:10:09
Speaker
typing everywhere. um By the time they actually decided to hit merge, they still had 33,000 thirty three thousand at ts expect error tags in their code base, 33,000 places where they knew TypeScript was going to throw an error.
00:10:25
Speaker
And they decided to ship anyway at that point because they wanted to get off of the old technology onto to the new technology. And they understood that if they shipped with those exceptions in place,
00:10:37
Speaker
Those were known knowns. Those were errors that they could deliberately suppress and they could burn them down afterwards. And better yet, they could share with their team exactly what was still to be done.
00:10:50
Speaker
So they weren't dumping the thousands of developers at Stripe into a world where there's unknown unknowns and landmines everywhere. They'd actually gone through and inventoried it beforehand.
00:11:02
Speaker
But of course, doing that inventorying take takes work. There's no better technology that I can think of to go dump reams of information, 33,000 errors, into a computer and have it sort it and organize it for you so you can come out with ah an explicit way to march through the problems.
00:11:22
Speaker
um It echoes what I heard at Slack when Madeline talked about burning down the errors on of a test migration. Having a list of known errors gives people the confidence to move forward.
00:11:34
Speaker
But even better, knowing how serious those errors are and doing a little bit of triage and investigation on those errors can give yet better confidence that you're in a state where it is okay to move forward even though you're not at 100%
00:11:52
Speaker
So those are, I thought, those are fantastic blog posts. and You should absolutely read them. I'll drop the and notes in channel um or in... ah I'll drop the links to those blog posts in the description of the video.
00:12:04
Speaker
um Pinterest, Stripe, Airbnb, and Zapier all did the same migration. And what they learned is that every code base is different. So even if you're working on the same project as 100 other companies, think about how it'll work uniquely for your company and your code base.
00:12:21
Speaker
And use AI to figure out those differences as early as possible.

Scaling Challenges at Twitter

00:12:26
Speaker
All right.
00:12:28
Speaker
I would love to be able to learn everything up front. But TR, what if you can't learn everything up front? What if you discover something midstream? That's obviously always going to happen.
00:12:40
Speaker
When I was thinking about this this question, the story that came to mind was Ryan King's ah telling of the twitpocalypse. This was a repeated set of issues scaling how they stored tweets in the early days of Twitter.
00:12:54
Speaker
Early on, It was just a single MySQL database. They used signed integers because that happened to be the default in MySQL that gave them about 2 billion tweets before they ran out of storage. And they had to upgrade from 31 to 32 bit, 32 to 64 bit, 64 bit to multiple on one machine to multiple machines.
00:13:13
Speaker
And What they found is that they kept learning the lesson that every single change was fractal.
00:13:24
Speaker
They needed to go find what were the small things that they didn't know. And then they needed to share them as broadly as possible because the thing that defined their schedule was the communication.
00:13:37
Speaker
It was how the people around them reacted to this information because meant their mental bandwidth wasn't infinite. I'll give you an example. The easiest version of of this was the 31 to 32-bit migration.
00:13:49
Speaker
ah Just go from unsigned integer to, or signed to unsigned integer. And actually, with a couple of database tricks, this is like an alter query in MySQL that takes about an hour to run.
00:14:02
Speaker
No big deal. They had six weeks to do it. Seems like plenty of time. But... They needed to make sure that everyone knew it was going to happen. Not just everyone at the company, but their third-party ecosystem. Because this is before Twitter had closed off access to the API in a lot of ways and um didn't even have an official mobile client.
00:14:20
Speaker
So they needed to go tell everybody. And what they found is that they didn't know... how they're their third-party developers were going to react. So they simply told them as early as possible.
00:14:32
Speaker
And that gave them almost a month for everyone to react. And that mental bandwidth that they gave those teams meant that that first migration was successful. Similarly, going from 32 to 64 bits, what they found is that they figured this would be simple again.
00:14:49
Speaker
Tell everyone externally, tell everyone that they needed to make a couple of updates. But in 2009, 32 to 64 bits was not actually so simple. The problem that they found is not that some that everyone had to make a change. It was that the core libraries...
00:15:06
Speaker
that people were using to store Twitter IDs did not support 64-bit integers by default. You had to go find a version in Ruby, in Java, of the libraries you used to represent your your tweets and go change them.
00:15:21
Speaker
Because people were using libraries that didn't have 64-bit integer support. what Database libraries that didn't have 64-bit integer support. And that ended up being the problem. And actually, it was mostly a problem internally because they were using um they're using Rails as their monolith. They were moving increasingly to the JVM yeah and Scala.
00:15:38
Speaker
And internally, they had to tell everyone, you probably have to go look at your libraries and do this evaluation. And that takes time. So...
00:15:50
Speaker
Yeah, absolutely. You could use AI. But the lesson that I actually take away from this is not that everyone should be using AI for everything. like That's a great feeling. And i'm um I don't disagree. But more importantly, you need to share that information as fast as possible.
00:16:04
Speaker
Because if you don't know exactly what you're telling people to consider, you're telling them to go do an evaluation, do real work. They need time to internalize it, to think about it, to pick the right path going forward.
00:16:15
Speaker
And that means being as clear as possible on what you found and how it's communicated. and
00:16:23
Speaker
Twitter didn't have Slack. Slack didn't exist at the time, which is now the de facto way to go communicate with your your team internally. um But even then, Having been at Slack, I've seen plenty of channels that nobody reads.
00:16:35
Speaker
Getting ah coherent, concise plan in front of people really makes a difference. And being able to say, this is what we've learned. This is how we learned it.
00:16:45
Speaker
And this is how you should act on it buys that time. Making that fast is where I actually see a huge opportunity here. And in in conversation with Ryan, he talked a lot about how the the planning and communication was the important part of that project. getting Giving people, if you're running one of these these projects, as soon as you learn something, make sure that you do as thorough an investigation as possible. Share that information as broadly as you can.
00:17:16
Speaker
So that's the first bit. um The interesting piece about the next ah saga in that story is that everything got much harder. Once you're on 64 bits, great.
00:17:28
Speaker
and That's a lot of integers. i't Two to the 64 is far more tweets than they'll need for a very long time. But I did hear recently that Twitter is about to exhaust those. um They needed to scale off of a single machine.
00:17:41
Speaker
And the the interesting thing that they they did there, and you should absolutely go listen to the episode, is moving to a different ID generation system. So IDs could be generated in a distributed fashion using technology called Snowflake. um the
00:17:58
Speaker
What they discovered is as you are rolling out 64-bit integers, JavaScript nominally supports greater than 32-bit integers. But it's not an integer type.
00:18:11
Speaker
That one number type that's baked into JavaScript actually is a hybrid type. If you're JavaScript developer of any experience, I'm sure you've tripped over this.
00:18:21
Speaker
ah the number one The numbers 1 and 2 and 5 and 2 to the 10 and 2 to the 50 are all represented precisely. But as soon as you cross over 2 to the 53, JavaScript represents that as 64-bit float.
00:18:36
Speaker
It's no longer precise enough to represent an ID. And what they actually had to do there is that they shared that information as fast as possible. And they moved to offering strings of IDs.
00:18:49
Speaker
So JavaScript could go parse it correctly. it turns out that the fact that IDs are numbers is not important. They just have to be unique identifiers. um But that wasn't good enough because a string ID requires meaningful change in a number of clients.
00:19:04
Speaker
And what they heard by publicizing that information quickly is that not all of their internal and external clients were going to have enough time to use the string IDs instead of a number ID.
00:19:17
Speaker
It's just too far of a bridge given database architectures or caching architectures or the libraries they're using. So, uh, It was kind of bummer that Ryan actually had to go back and change the starting date for these IDs in order to buy more space.
00:19:35
Speaker
That he'd set the kind of zero date for these IDs at the time that Jack tweeted the first public tweet, just setting up my Twitter. But that meant that by the time you got to the current day, they were generating and integers that were over two to the 53. And that was going to break everyone who hadn't managed to switch to string clients.
00:19:56
Speaker
So they ended up moving the Epic up to essentially the moment of launch. And that meant that the IDs that were being generated in real time were still less than 2 to the 53. So even if you were affected by this bug, you still had ah few months to go deal with it before the new IDs crossed over 2 to the 53.
00:20:15
Speaker
And that's that lesson here is so interesting because small changes can be huge and you need to be able to research midstream, what are your options? What could you change? Where the real dead ends?
00:20:29
Speaker
And that again is where we I've seen a lot of folks today using AI to essentially parallelize their efforts. You don't want to slow down your migration. You don't want to stop doing the work because you're up against the clock.
00:20:43
Speaker
But if you can send an AI off to figure out, hey, are there other options here? Hey, how widespread is this problem? can we Can we go prove a different piece?
00:20:55
Speaker
b Brian spent weeks running Snowflake to prove that it was going to generate totally unique IDs. The more you can use AI on those lower value or verification tasks, the better you can become and the more time you can spend on the things that matter.
00:21:12
Speaker
so the figuring Being able to figure out the scope and gather more information allows you to communicate more effectively, and that communication is really frequently what will drive the success of a migration.
00:21:28
Speaker
You need to be able to listen to those signals and take them seriously, um and AI can lighten that load. What if you learn something that is material to the project? Everything within Ryan's story was minor signals.
00:21:40
Speaker
Simply communicate faster. Simply tell people that a library was ah was updated. But what if the change you discover in flight changes the entire shape of the migration?
00:21:54
Speaker
What if you've learned something that is going to fundamentally change your approach?

Lessons from the Gov Slack Project

00:22:00
Speaker
I've hit this. My last project at Slack was Gov Slack. the simplest product spec you've ever written, at least in the first three lines.
00:22:08
Speaker
We want to take Slack, and we want to deploy it in AWS GovCloud so it can be used by people who need a higher level of security and compliance. What we learned during this project is that the smallest details define the schedule.
00:22:22
Speaker
We learned this again and again and again. So this was my project from before the beta through GA launch through getting compliance. um And it required 300 people at Slack to contribute or 50 different teams to require.
00:22:39
Speaker
It was behemoth. I'm incredibly proud of of what we we accomplished. But I got to say, if I had known then what I know now, i think we could have gotten it done a year earlier.
00:22:53
Speaker
Okay. what did i What do I know now? I want to talk about AppArmor. One of the core sets of compliance ah requirements here is FedRAMPi. There's 600 separate controls in FedRAMPi, and they are dense.
00:23:09
Speaker
I blessfully did not know read them. I had amazing security partners who helped me interpret and understand them, but I got pretty familiar with them. There are a set of configurations that you need to need to handle. Things like your your application needs to have minimal access to the files, libraries, and network resources in its baseline configuration.
00:23:32
Speaker
CM3, you need configuration change control. Your application ah cannot, should not alter its security posture live. um And any everything should be audited. auditive AC3, access environment, and employee implements mandatory access controls.
00:23:50
Speaker
As we read through how we wanted to tackle this, we a couple of ideas. And in many cases, the requirement is simply auditing, least privilege, good configuration. You can accomplish this with what you need.
00:24:05
Speaker
What we realized as we went through this is that we weren't quite meeting the bar across a number of different services. We needed to implement AppArmor. AppArmor is, if you're not familiar, a way to essentially configure how a binary on typically Linux, as we were doing, interacts with it environment its environment. What system calls is it allowed to make?
00:24:28
Speaker
What changes to the running binary is allowed to make? Typically, none at all. um The interesting thing here is that This actually ended up being the long pole in the tent for submitting our compliance requirements.
00:24:43
Speaker
This is the scoping. the We didn't know that we were going to do AppArmor at the start. We'd suspected, we were hoping that we could get around to it. The I think the interesting thing here is that we lost a couple of months because we were unsure of exactly and how bad this was going to be.
00:25:02
Speaker
This scoping was a real like ah real activity, and it took us a couple of months of experimenting and prototyping, working with people who are actually going to have to implement AppArmor because Slack has dozens of services in the critical path, hundreds overall. They'd all have to implement it.
00:25:18
Speaker
Making sure that we understood how this work was going to be done was the work. And the thing that really, really turned the corner for us is that once we decided this was going to be done, it was fairly straightforward. It did take yeah a quarter or something everyone to to implement. But early on, we lost that time because we didn't know how to do that scoping.
00:25:45
Speaker
And we wouldt we were unwilling to stand up and make that call because we needed more information. And AI here can help set people up for success because you can share context.
00:25:57
Speaker
You can do honest, objective scoping partially and then layer human human judgment on top of that. having an a Having AI give us this kind of shared reality that we could trust, and yeah, you have to you have to build a ah trusted shared reality with AI to make sure you're not getting hallucinations, but it can be done.
00:26:17
Speaker
That can give you an enormous lever to go have human conversations about scoping. I'll give you an example that's even earlier from that. When were building the environment, at some point, we had to rotate all the secrets, exercise you have to do if you're doing high compliance work like this.
00:26:36
Speaker
The team that took the longest took twice as long as any other team. And the reason for that is because they had twice as many secrets. It's a manual operation to do that. Not every one of those is automated, especially in a heterogeneous complex environment.
00:26:52
Speaker
It's pretty typical at at that kind of scale. And what I noticed during that process is that we were unsure of how long it was going to take, in large part because we had to get that information again as a scoping exercise.
00:27:06
Speaker
And it turns out that if you're the person who's asked, hey, how long is it going to take you to rotate secrets? And you give a number that you know is going to drive the schedule of this, you know, priority zero project for the company. And you say, well, I need two more months than everybody else.
00:27:23
Speaker
That's not an easy answer to give. And it's not an easy answer to hear. The organization doesn't want to hear it. What I what I really What I really learned during that is that you need to pay attention to what you're asking people to look at and where scoping can be an activity that is subjective versus objective.
00:27:46
Speaker
Seeing the reality of the work is something that and that engineering manager, she did an amazing job sticking up for her team, making sure that it was it was clear, but there was still distrust because it was simply a conversation.
00:27:59
Speaker
Bringing data to the table allows you to have much more confidence that you are truly seeing an outlier.

Effective Migration Planning with AI

00:28:06
Speaker
And having that data allows you to marshal the entire team around, yeah, this is what we need to do and refocus on, we are going to support that team in rotating their secrets. We're going to support the teams that need to mi migrate to AppArmor.
00:28:20
Speaker
and i And that refocusing is only something you can do when you have confidence and when you have data. And I know that going forward, I'm not going to make this mistake again. I think I know that as we do migrations and we figure out how to move forward, I need to be able to see the data.
00:28:39
Speaker
So I'm not simply asking people, can you do it? Or can I bully you into a shorter schedule? Because that the work is the work and you can't get our get out of it.
00:28:51
Speaker
So... In every migration, one small thing becomes the long pole. You've got to find it, and AI can help you find it before it burns your schedule.
00:29:00
Speaker
So that's it.
00:29:03
Speaker
Migrations aren't slow because the code is hard. They're hard because learning happens while you're already doing them, and there's a real opportunity for AI to help you speed up that learning at the beginning of the project, to help you communicate and help your teams learn faster in the middle of the project, and to understand what is truly the long pole in the tent.
00:29:23
Speaker
you need you can You can research harder, you can share those learnings, you can spot that schedule-defining detail. And this is all muscle you can build. AI today is in a place where no product is perfect, but the more you use it, the more you're you're comfortable with it.
00:29:40
Speaker
And if you can start to use AI, even in little pieces, to go build shared data truth, to build migration plans, and to give teams a space to go organize and interrogate what they are going to do, you can start to pull information forward in a way that allows migrations to really feel like they're on rails.
00:30:03
Speaker
AI isn't going to replace migration planning, but it can make teams learn 10 times faster. And that is the real bottleneck.