Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Building Render: Inside a Modern Cloud Platform (with Anurag Goel) image

Building Render: Inside a Modern Cloud Platform (with Anurag Goel)

Developer Voices
Avatar
0 Playsin 2 hours

How would you build a Heroku-like platform from scratch? This week we're diving deep into the world of cloud platforms and infrastructure with Anurag Goel, founder and CEO of Render.

Starting from the seemingly simple task of hosting a web service, we quickly discover why building a production-ready platform is far more complex than it appears. Why is hosting a Postgres database so challenging? How do you handle millions of users asking for thousands of different features? And what's the secret to building infrastructure that developers actually want to use?

We explore the technical challenges of building enterprise-grade services—from implementing reliable backups and high availability to managing private networking and service discovery. Anurag shares insights on choosing between infrastructure-as-code versus configuration, why they built on Go, and how they handle 100 billion requests per month.

Plus, we discuss the impact of AI on platform adoption: Are LLMs already influencing which platforms developers choose? Will hosting platforms need to actively support agentic workflows? And what does the future hold for automated debugging?

Whether you're curious about building your own platform, want to understand what really happens behind your cloud provider's dashboard, or just enjoy hearing war stories from the infrastructure trenches, this episode has something for you.

Support Developer Voices on Patreon: https://patreon.com/DeveloperVoices

Support Developer Voices on YouTube: https://www.youtube.com/@DeveloperVoices/join

Render: https://render.com/

Render’s MCP Server (Early Access): https://render.com/docs/mcp-server

Pulumi: https://www.pulumi.com/

Victoria Metrics: https://victoriametrics.com

Loki: https://vector.dev/docs/reference/configuration/sinks/loki/

Vector: https://vector.dev/

Kris on Bluesky: https://bsky.app/profile/krisajenkins.bsky.social

Kris on Mastodon: http://mastodon.social/@krisajenkins

Kris on LinkedIn: https://www.linkedin.com/in/krisjenkins/

Recommended
Transcript

The Complexity of Software Deployment

00:00:00
Speaker
I'm told putting software into production has never been easier. You just throw Kubernetes at it, right? No, no. Even with wonderful tools like that, putting software live is still fairly painful.
00:00:13
Speaker
We've definitely got better at it. DevOps is probably as good today as it's ever been, but I wouldn't describe it as easy. You've still got to know how to use those tools, of course.
00:00:24
Speaker
You've got to plan your provisioning. You've got to plan for monitoring and telemetry and security updates and scaling and infrastructure as code, which I believe means you choose between swearing at Terraform files or swearing at YAML files.
00:00:42
Speaker
not really surprising that the process of putting code live is a whole sub-profession in itself. And it's also not surprising that there are services out there offering to do it for you, to take the pain off your plate.
00:00:56
Speaker
I personally would never start one of those businesses. It scares me too much. You can become an expert at the things you need to be expert at, the provisioning, monitoring, security stuff, but you still get so many other problems that are really daunting, like user demands.
00:01:13
Speaker
Every one of your users will want a slightly different set of features, and they will all want 100% of the things they want right now You have a really unique scaling problem.
00:01:24
Speaker
You're going to be trying to scale up your user base while every one of your users are trying to scale the demands they're going to put on you. That's fairly terrifying. You have a huge usability user interface problem. You've got to be more usable than AWS.

Introducing Anurag Goel and Render's Growth

00:01:40
Speaker
Actually, that part seems possible, but the rest sounds really hard. But it's a rule of thumb in our industry. If there's a bunch of really hard problems out there, there's going to be someone who enjoys solving them.
00:01:52
Speaker
And when I find out about them, it's a rule of thumb that I'm going to want to get them in and find out what makes them tick. So I'm joined this week by Anurag Goel. He is the founder of Render, which is a hosting platform somewhere between a modern Heroku and a simplified AWS.
00:02:09
Speaker
if been building Renderer in Go for several years, and it's getting huge. They're taking on something like 150,000 new users a month. How do you do that?
00:02:21
Speaker
How do you manage the scale? How do you manage the infrastructure? And how do you keep them all happy? What are the design decisions you have to make and live with? And while we're taking notes, how do you get 150,000 users a month?
00:02:36
Speaker
You know, if you're starting a software as a service platform, you might want to know what the secret to that is too. All seems worth knowing. We're going to dig into all of it and the impact of AI on the hosting market.
00:02:48
Speaker
What happens when so many people are delegating their hosting decisions to an LLM? How does that change things? Before we begin, this episode has been sponsored by Render, so let me tell you what that means and what it doesn't mean.
00:03:03
Speaker
They have no control over the questions. They did not get a list of questions beforehand, and for the record, I think that would be a terrible idea. People are often at their most interesting when you get them thinking on their feet.
00:03:15
Speaker
I think knowing the questions beforehand would do everyone a disservice. Render did get a preview copy of the episode before it went live, and they got to choose the release date of the episode, which in practice means they jumped to the top of the queue.
00:03:30
Speaker
So with that made clear, it's time for me to play host to a hosting expert. I'm your host, Chris Jenkins. This is Developer Voices, and today's voice is Anorag Goel.
00:03:53
Speaker
Joining me today is Anurag. How are doing, sir? I'm doing great, Chris. It's really good to be here. I'm glad you could join me. Where are you coming in from? I am based in San Francisco.
00:04:04
Speaker
ah So your apartment costs are sky high. um i like to not think about that.

Building and Scaling Render's Hosting Platform

00:04:12
Speaker
Okay. We won't start on a downer. We'll start on something cheerier.
00:04:16
Speaker
yeah So getting straight to the point, right? Render, eight it's one of these hosting platforms that people will be reminded of Roku. In some form, it's like i I can host enough stuff there that I don't have to worry about DevOps.
00:04:34
Speaker
and I was thinking about this and I thought, well, if I were trying to build that 10 years ago, honestly, i wouldn't know how to build it. If I were trying to build it today, i probably say, yeah, we need to hire a bunch of Kubernetes people and a good enough web designer to make me a nice web platform.
00:04:54
Speaker
And that would probably get me to version one of a hosting-esque platform. Is that roughly right? It depends on the people. So, you know, sure these days, um if you have someone who understands Kubernetes incredibly well and understands developer experience and can use ah plot code or cursor or VS code,
00:05:23
Speaker
um effectively, ah then maybe just one person can try to build the very, very, very first version um of an internal or even an external developer platform.
00:05:39
Speaker
ah But it'll be missing... 99% of the features that they will eventually have to build. Okay. Okay. In that case, i was I was expecting you to say it was a little bit harder than that. But if you're thinking it's easier, once I've got my MVP of this, which I'm going to say that static hosting hosts enough of a JavaScript site and enough of a Postgres database that I'm running,
00:06:09
Speaker
what makes yeah What makes it hard after that? Yeah, well, hosting a Postgres database actually makes it 10x harder.
00:06:20
Speaker
ah So like the MVP would actually just be ah simply maybe a static site host or a simple Python website hosting thing where you could run a Flask API. Because even static...
00:06:36
Speaker
versus just pure backend web service. Those are two very different things. Because for a static site, you have to have a CDN in place, you have to invalidate caches, ah you have to do headers properly.
00:06:50
Speaker
um So there's a whole thing that goes into building good static site hosting. ah But then... the way you start maybe is by just simply hosting a backend HTTP service.
00:07:07
Speaker
ah And that could be a you know, let's say it's a Python service. ah The thing that makes it more and more complicated from that point on is reliability and scale And the 1,000 plus features that your customers will ask for once you start at this point.
00:07:30
Speaker
um And the next feature they might obviously ask for is, ah like ah like you were saying, a Postgres database. And building a reliable, scalable,
00:07:42
Speaker
enterprise-grade Postgres is a fairly large endeavor. um And this is why there's only so many Postgres, solid Postgres hosting companies in the world. um Yeah.
00:07:55
Speaker
What makes it hard out of interest? I mean, feel i'm I'm naive, but that's why I've got an expert to explain to me. It feels like it's just, okay, here's another process that needs to have a decent chunk of disk reliably available.
00:08:10
Speaker
Well, my life would be significantly easier if that were the case. so So I think the main thing that makes it interesting and hard ah is ah making sure that all the operations of things like...
00:08:31
Speaker
um backups and point in time recovery and high availability and and disaster recovery. um All of those things are built into the solution and are completely hidden from the user because they pay you did not see those things.
00:08:55
Speaker
And you have to give users an interface that allows them to size their Postgres up and down. um you have to figure out how to give them the right level of observability into your Postgres. That's a lot of work.
00:09:10
Speaker
um You have to make sure that um if your underlying hardware needs to run maintenance for whatever reason, because machines fail, then their workloads are interrupted as little as possible.
00:09:22
Speaker
um You have to make sure that you have the right performance on both the compute side and the disk side, ah because disks have varying levels of performance, right? You could host something on a local NBME SSD, or you could host something on a network disk, like an AWS Elastic Block Store.
00:09:44
Speaker
um How do you make sure that when you start up a new Postgres, you have a system that reliably attaches whatever disk that you're creating allocates a machine where you have to host the compute and makes those things talk to each other performantly.
00:10:01
Speaker
um and and then If things are running hot on Postgres, as often, you know, some config might require your Postgres to be restarted or users might change something that force the Postgres um to consume all its CPU.
00:10:21
Speaker
There's all kinds of those things that come up in production use. And then when you expand this to essentially... in Render's case, millions of users doing all kinds of things with Postgres, then it becomes really interesting.
00:10:38
Speaker
And then when you expand it to people using ah Postgres from other providers and migrating over to Render, that makes it even more interesting. How do you make the migration as low downtime as possible?
00:10:51
Speaker
How do you set up follower databases? How do you set up, you know, read replicas? How do you set up multiple read replicas? How do you punch through firewalls? Yeah, of course. Yeah. I mean, the other big thing is um private networking for your application and Postgres.
00:11:09
Speaker
So a lot of providers simply offer a public yeah URL for your Postgres, especially if they're dedicated Postgres providers. Then they'll give you a public Postgres yeah URL. Obviously, you have to build authentication. you have to give people a way to log into their Postgres using some sort of proxy and so on.
00:11:27
Speaker
ah But if you are an end-to-end full-stack multi-service orchestration layer like Render is, then... um you would want to build out private networking between um your website or your back-end service and your Postgres.
00:11:45
Speaker
And that means that your back-end service should be able to talk to Postgres without going over a public internet connection. And that implies building out a way to assign internal IPs and then having a service name ah for the internal IP. So doing service discovery, having internal DNS, because instead of giving users internal IPs, you actually want them to have a name.
00:12:12
Speaker
um yeah So the IP could change if the Postgres restart or you know something happens with the compute process. And so how do you make sure that and The application has a stable identifier.
00:12:23
Speaker
um how do you make sure that read replicas have their own special identifiers? So, you know I could go on and on. You're smiling like you enjoy this problem, and yet you're filling me with terror, to be honest.
00:12:36
Speaker
Well, this is why Render exists. This is exactly ah why
00:12:44
Speaker
we built Render and continue to build not just, you know, really strong enterprise-grade Postgres, but also the entire infrastructure that allows companies to not just start really quickly, but also to scale on this platform, on Render, as they grow. And in that sense, I think Render's perhaps the...
00:13:06
Speaker
um really the most forward-thinking cloud that allows you to both start quickly, but focuses on how your needs will change as your applications become more complex, as you start creating more and more services.
00:13:22
Speaker
Now that sounds good, I'm going to have to get you to um outline why I should believe that with details. That's what this show is about. Yeah, absolutely. I'd love to. So i i from what I'm hearing from you, I see a bunch of problems. One is that for a service like this, everybody

Infrastructure Management at Render

00:13:40
Speaker
wants to not think about any of the features of the database except the ones they want, right?
00:13:45
Speaker
Like maybe I need point point in time recovery, but I don't want to think about any other aspect of Postgres. Yeah, but someone else might want extensions. And yeah how do you make sure the right extensions are available? Yeah, yeah. How do you do version upgrades of Postgres? I mean, again, I could go on.
00:14:00
Speaker
Yeah, yeah. So how do you, and then and then we have to multiply it kind of horizontally by, it's great that you support Flask, but I want to use Node or I want to use Rust as my web server or whatever.
00:14:11
Speaker
Absolutely. You've got this universe of features and a universe of demands. How do you decide what's worth focusing on?
00:14:21
Speaker
Oh, and that is the eternal question, isn't it? I think it all comes down to some combination ah and the biggest problems that the broadest subsection of ah the developer community is feeling.
00:14:42
Speaker
and um And then also... thinking about how you can offer a unique solution there. ah So as an example, S3, despite its configuration complexity, is fundamentally a great product.
00:15:03
Speaker
ah Amazon has been building it since... 2005. it's a product that has matured over 20 years. But not only is it continuing to innovate, ah it also continues to scale. And the world runs on S3.
00:15:19
Speaker
um If S3 were to go down across the board, um we would all shut our computers down and watch TV. and And maybe TV wouldn't work either. Yeah, I think we've seen that happen a couple of times, all right?
00:15:33
Speaker
Well, the good thing is that it doesn't happen. I mean, I don't even remember the last time S3 had an incident. There's a bunch of AWS incidents. Oh, yeah, that's fair. That's fair. Yeah. yeah But S3 itself, the service is tremendously stable.
00:15:46
Speaker
And so, you know, a lot of Render customers, they ask us to build object storage because they know that S3 is hard to configure and Render would do a better job um with the usability side of it, which, you know,
00:16:02
Speaker
We will. um But should we build an S3 replacement or should we solve a problem that AWS hasn't solved?
00:16:13
Speaker
um You must be tempted to say, why don't we just put a nicer interface on S3 and call it ours? I think that's right. But even then, you're taking on a lot of responsibility at the end of the day, even if you put an interface on top of S3, because the most obvious thing then you have to think about next is,
00:16:34
Speaker
What happens when S3 releases a new feature or changes its API, or there are specific um performance characteristics that you know your users need that are only available in the new version of S3's API?
00:16:51
Speaker
And so you have to keep up with S3 if you simply decide to put a skin on it. And your users would actually be, i think, better served, ah even if it takes a little bit longer to get started on S3 and to secure everything and you deal with IAM and security groups and it's a bit of a mess.
00:17:14
Speaker
um I think skinning AWS products is failing proposition over the long term ah because you have to really provide true value on top for it to make sense.
00:17:34
Speaker
And it's very hard to do that with a simple skin. um Yeah, okay. Yeah, I'll give you that. then Then let me pin this down on um on something technical, which is like...
00:17:48
Speaker
I noticed you have a system for configuring deployments using YAML blueprints, you call it. right And that came into my mind. like YAML, it's great in know that it's that kind of user-friendly or at least developer-friendly skin over a more complex ah ah service.
00:18:06
Speaker
But you've got this problem that YAML isn't very good at expressing really complex ideas. So aren't you there hiding complexity in a way that's going to come back and bite you when someone says, I need that feature?
00:18:22
Speaker
guess the question yeah is, what kind of ideas are you thinking about when you think about YAML? because There is the definition of infrastructure, which is kind of sort of static.
00:18:38
Speaker
And then there's code, which is dynamic. And yes, YAML is not a ah full programming language. um And you can't express, you know, loops and stuff in YAML, although there's people have tried.
00:18:51
Speaker
um But if you wanted to specify ah state of the word that you wanted to exist without adding logic, without adding runtime complexity, then YAML actually is very good at that.
00:19:06
Speaker
And so is JSON. And, you know, right now i'm creating YAML and JSON interchangeably. So those are just data formats as opposed to compute formats. And code, you know, general programming languages are more code expressions, runtime expressions, whereas YAML, JSON, those are all data formats.
00:19:25
Speaker
um And as long as you can think about your infrastructure specification as data YAML and JSON are actually quite reasonable um as specifications.
00:19:37
Speaker
um And you don't need to be more complex. But I mean, another way to think about this is, look, people have built hugely complex systems with millions of lines of YAML writing Kubernetes code at the end of the day. um All Kubernetes manifests are YAML.
00:19:54
Speaker
And there's only so much you can do with static YAML on Kubernetes, which is why Kubernetes has a Go API, and ah ah it's written in Go, there's a Go SDK.
00:20:05
Speaker
And so for more complex things where you have to deal with the API or deal with Kubernetes in a way that is much more dynamic, that's where you use let's say, go ah to talk to Kubernetes API versus specifying, sending it YAML files, which is what most people do because they think about their infrastructure as static. And it changes over time, but it doesn't change as frequently.
00:20:29
Speaker
So YAML yeah is fine. It's mostly kind of snapshots, right? Exactly. It snapshots, data. Do you find that, are you perhaps saying that we offer YAML until it becomes too complex and then there's a Go SDK for render as well, are you working very hard to keep the design within the YAML space that you don't have to do that?
00:20:52
Speaker
No, I think our philosophy here is ah so use the best tool for the job. And For this reason, we expose multiple interfaces.
00:21:05
Speaker
We obviously expose YAML as a way to specify your infrastructure. ah However, we also have REST API and you can build...
00:21:16
Speaker
really dynamic infrastructure projects using Render's REST API, and a lot of people do. So if you're spinning up things on demand, let's say your user wants a database, instead of writing out a YAML file for Render and then running infrastructure as code syncing, you'd rather use Render's API to spin up a new database if your user wants one, right?
00:21:38
Speaker
And so for things that happen that need to be Essentially, programmatic responses to customer needs, it's much better to have an API.
00:21:50
Speaker
um And essentially, you can you you could also do the same thing with code, where you write some YAML, generate YAML using code, and then use that to sync. But for programmers, the ergonomic solution is actually to use a REST API, right?
00:22:04
Speaker
and Instead of generating YAML. um And so we have a REST API. We also have a CLI, um which you can use for other things because often you're in a command line environment and you need to be able to, let's say, deploy a service or fetch the logs.
00:22:21
Speaker
ah But it's harder to do that with a pure REST API because you you could string a bunch of curly commands together um and and make it really complex. Or you could just use high-level...
00:22:33
Speaker
um verbs in a CLI saying, you know, render get service ID and it will show you or logs for a service ID instead of spinning up a curl command on demand.
00:22:49
Speaker
So it all depends on the context in which you're operating, which

Render's Technology Stack and Performance

00:22:52
Speaker
is why people need different ah interfaces for different contexts. um the The ultimate context is, of course, when humans are designing infrastructure and they don't want to deal with infrastructure as code. They want to deal with an API.
00:23:06
Speaker
This is why we have a dashboard, which does everything that you would want to do. Yeah, yeah. What this is making me think of is, I don't know if you know of it, but I'll explain. So there's um there's a tool for deployment called Pulumi, which I've used a couple of times.
00:23:20
Speaker
yeah And their answer to this question is, let's try and make the same SDK available in all the languages if we can. So whatever thing you're currently using, it will feel familiar.
00:23:32
Speaker
Yes. Have you ever been tempted to go down that kind of route where you just spray every possible option out to your users? I don't think that Pulumi
00:23:46
Speaker
is that different from YAML. I mean, yeah at the end of the day Pulumi is just giving you a better interface, but still to define the end state of infrastructure.
00:24:00
Speaker
um And by that, I mean... Pulumi gives, instead of you writing YAML, Pulumi lets you express your infrastructure um in in specific kinds of code. And so you can use the built-in niceties of your programming language.
00:24:19
Speaker
yeah If you want to have five servers, instead of specifying five different servers in YAML with five blocks that are all the same with one thing that's different, you can run a loop in your programming language and say, these are the servers.
00:24:33
Speaker
yeah um And so they're just like they're um programming languages that are imperative, um i think in some ways, Pulumi is trying to use the imperative structure of programming languages to ultimately end up with essentially data that defines your infrastructure.
00:24:51
Speaker
So for Render, we think that... There are ways to use ah programming languages to define infrastructure more dynamically.
00:25:02
Speaker
And an example of that is, say, ah let's think about functions as a service. right Typically, and when you are writing like a Lambda or something else that is a function as a service, you are that is infrastructure.
00:25:23
Speaker
And it's very closely tied to your code. um And it has a convention that is understood throughout where it gets a request. You have to return a response.
00:25:35
Speaker
And that is sort of ah another way to define what needs to happen when ah when a request comes in. But that is, that you know... that is not just infrastructure, it's also defining the behavior, ah the runtime behavior, um which is going one step beyond Pulumi.
00:25:54
Speaker
Because Pulumi doesn't yeah define runtime behavior of of your code. Yeah, okay. Yeah, yeah. um
00:26:06
Speaker
Yeah, i think i see I think I see what you're getting at. I just... I wonder if eventually... i Let me step back. A problem I've seen with all kind of static infrastructure deployment systems is this problem where you end up with five almost identical servers, and then you're trying and to find a templating system and a for-loop system in YAML just so you can try and stay sane with that. And as it grows larger, the problem gets bigger and bigger.
00:26:35
Speaker
Yeah. Yeah, and so which is why I think you need to give people... programmatic interfaces to define an infrastructure than pure YAML. And so YAML is one way to do it.
00:26:47
Speaker
And it works well when things are simpler. um But also, you know, going back to obviously the the leader in the space is Terraform. And people have massive, massive Terraform ah folders in their code base.
00:27:04
Speaker
ah And they're actually using Terraform's ah c specific language, HCL, ah which is built explicitly to define infrastructure. And it actually takes every cloud primitive and exposes it as a high-level object or, our our you know, resource into our forms language.
00:27:28
Speaker
And at the same time, um You still, anytime you need into infrastructure to be more dynamic and spin up and down in response to things, Terraform breaks down.
00:27:42
Speaker
And so, yeah, well, any kind of static infrastructure definition is very different from... spinning up infrastructure or spinning it down or modifying it in real time.
00:27:54
Speaker
um And that's why at Render, ah behind the scenes, um we use Terraform to spin up certain parts of our infrastructure ah But we also have lots and lots and lots of Go code that does things on top of that simply to interact with infrastructure.
00:28:17
Speaker
um and And this includes spinning up databases. like all our There's a lot of work that we do ah to spin up databases and to manage them. And all that code is written in Go. um and Okay.
00:28:31
Speaker
Yeah. But the machines that those databases run on, behind the scenes, you can define a static pool of machines using Terraform.
00:28:41
Speaker
And then you, within Terraform, you define an auto-scaling policy. But even that policy is static. But then behind the scenes, something is listening for when to scale up and down. And that's essentially a a controller of some sort.
00:28:57
Speaker
But Terraform lets you define the controller, but the controller behavior is actually specified in code somewhere. Right. So you yourself are using a mixture of static definitions and code to react to the difficult bits. Yeah, exactly. And presumably that's the advice you give me as a user.
00:29:14
Speaker
I think it depends on your needs. ah For a lot of companies... Infrastructure is not as dynamic, nearly as dynamic as as it is for render.
00:29:25
Speaker
I can believe that A lot of companies simply need, you know, a web service, a background worker, a queue, a database, um and some sort of static site, um maybe object storage.
00:29:44
Speaker
And their neat infrastructure needs don't change day to day. They just set it up once using Infra as code. And if they do change, then they change maybe in the sense of, oh, I have this VM that is maybe two CPUs and now I'm seeing more usage. So I need to increase the VM size to four CPUs.
00:30:09
Speaker
And that's where having Infra as code, so you can monitor, um exactly what changes you're making to infra. You can get it code reviewed. um You can go through an audit process. You have that change in your code base for posterity. That's really the value of infrastructure as code.
00:30:27
Speaker
And for that situation, you really don't need dynamic interfaces. It would be overkill to have a, to have it go through some sort of API um or having to use ah you know an SDK or Go code to define this. You just simply use like a render blueprint or ah Terraform file.
00:30:45
Speaker
And Render, by the way, also has a Terraform provider for people who really want to use Terraform. yeah i yeah The last time I used Terraform, I convinced someone else on the team to take that part of the code base over. So that's my feeling about Terraform.
00:30:58
Speaker
I don't blame you. i think fundamentally,

Observability and Monitoring at Render

00:31:03
Speaker
infrastructure, especially for application developers, is a stumbling block.
00:31:11
Speaker
And I just want my application to run as I expected to. I want it to be reliable. I want it to scale.
00:31:22
Speaker
i want to push out new changes and have them show up quickly. and want the ability to roll back changes if something goes wrong. I want to be able to see exactly what's going on in terms of my application performance.
00:31:33
Speaker
I want to see the logs when something um breaks. um I want to be able to collaborate on it with other people on my team. And so these are all the concerns that that trip people up today because you have to do a ton of work to make this happen on something like AWS, you know, or any hyperscaler for that matter. There's a lot of configuration.
00:31:55
Speaker
um And the sad part is that everyone's configuration at the end of the day looks very similar. you I mean, I know of it. Okay. Because I have thought it's like 90% the same, but all the difference is in that last 10% that people argue over.
00:32:16
Speaker
So the differences there might be in terms of, well, instead of, a like was saying, instead of a two gigabyte RAM service, I need five gigabytes of RAM. Or in instead of three instances of this server, I want to run five instances.
00:32:29
Speaker
I think those are the kinds of differences. Or instead of like a Java service, I want a Python service. ah i mean But the that it all comes down to like, what is the core architecture? that you're trying to support.
00:32:40
Speaker
And your earlier a question about how does Render decide what to build, think for us, it goes back to what are the architectures that we want to support? And how are people building applications today?
00:32:52
Speaker
And what are the challenges that they're running into when it comes to the cloud? And how can we think about it in terms of higher level needs that these architectures have? And how can we solve those needs at the architecture level as opposed to having everyone spin up a virtual machine?
00:33:09
Speaker
Yeah. but Okay, so let me give you a specific counterexample to that, because I can believe that a lots of these things are slightly different, but we've mentioned briefly Postgres extensions, right?
00:33:21
Speaker
I was looking at this. i i was trying to use Render with a Postgres database that had PG Vector installed. And along that path, I found out that you had recently added support for PG Vector.
00:33:36
Speaker
but do you get But do you have this problem that there is a queue of people saying, I want this little tweak to this standard engine. I want this extension installed in, I don't know, Kafka or something.
00:33:49
Speaker
And there's this wave of micro demands that people need to be solved to get yeah to move forward.

User Growth, AI, and Marketing Strategies

00:33:57
Speaker
Yeah, I think that's a really good point. And this is really the the core of of how we think about building render for companies that are scaling.
00:34:07
Speaker
So when you're starting out, your your needs look... quite similar because you have the same sort of server, whatever. As you scale, um often the need to customize things comes from your need to ah get better performance or for ah increased redundancy that creates more reliability.
00:34:28
Speaker
um And so with Postgres, as an example, ah certainly we had people were asking us for extensions that um are sort of super custom. They're actually an open source extension that five people use in the world. and And I think our if our goal is, again, to serve the 90% market where like Render should be able to work for 90, even 95% of the use cases out there.
00:35:01
Speaker
But there are these 5% use cases where maybe you are building ah Postgres company yourself, or you are building an infrastructure provider yourself. And that is where I think you might need very low level control, ah which...
00:35:15
Speaker
Certainly, Render is not going to let you patch the kernel. ah we Most people don't even want... 99% of the people out there don't want to think about patching kernels, which is why Render is as successful as it is. And...
00:35:28
Speaker
and you have to tread the the line carefully. So with Postgres, again, ah one of the things that people have asked us for over the years is, well, I want to configure this one parameter in Postgres.
00:35:41
Speaker
And yeah until recently, um we had that happen through our support team, unfortunately. Right, yeah. And that's not great, right? Because i don't want to talk to support to change a parameter.
00:35:55
Speaker
um And now Render is going to release ah a feature where you basically just have... um your Postgres configuration file more or less ah that you can tweak and we'll maintain the parameter changes. We'll tell you if a parameter change requires a restart.
00:36:12
Speaker
um And so we're giving people more power, more flexibility. um And that's one way to think about how we've done this for everything else over the years, you had to do a lot of work to give people the right level of flexibility without giving them the kitchen sink, um yeah which is the AWS approach. um And over time,
00:36:35
Speaker
You do that by continuing to both anticipate your customers' needs, but also just having customers who scale on you, who give you feedback about the kinds of things they need. And we found that as customers have grown on Render, our largest customers often have means that when we...
00:36:56
Speaker
um produce them and when we deliver them are incredibly similar to the needs of other customers who are coming behind them and going to that scale. um And so we're really learning from our customers about the right interface to build, but we're not submitting to the siren call of offering managed Kubernetes.
00:37:16
Speaker
Because if you did that, people could configure everything. And that's not the right answer. it's You can do that anywhere. I mean, not anywhere, but there's a ton of managed Kubernetes providers in the world. But how do you build a system where you don't need to use Kubernetes?
00:37:28
Speaker
At the same time, you get the best parts of Kubernetes that you actually want. yeah and one way you know That's one way to think about render. It's a really difficult design problem, finding that sweet spot between...
00:37:40
Speaker
um giving people enough power that they'll always be happy and not giving them everything that you can never get any kind of unity in what you're offering, right? Yes. And it's also possible to do in a way that gives...
00:37:59
Speaker
different people, different levels of control. Like there some people who need that complexity and are sophisticated enough or have the desire to work with that complexity.
00:38:11
Speaker
So an example is you could expose ah specific flag or a configuration and parameter um in the dashboard um and in your API and in your blueprint, as we talked about.
00:38:28
Speaker
But maybe... ah that parameter is only used by people who are sophisticated enough in their usage and their scale ah to be using Infraise code or to be using the API.
00:38:41
Speaker
um And so that parameter doesn't need to be in the dashboard ah because for most people who are managing their render infrastructure manually through a dashboard, they never get to that level of scale where they need infra as code.
00:38:55
Speaker
And so you have to think about, okay, well, what is the right place for this parameter? Everything should be in the API at the end of the day, because that's the most flexible way to use render. But not everything needs to be in your face in the dashboard.
00:39:08
Speaker
Yeah, yeah, yeah. You must have, i mean, how large is your UX design team? um It's not that large. um yeah where We're focused on hiring...
00:39:23
Speaker
really world-class talent, ah and then making sure that we can do as much as possible with as small of a team um as possible, because the larger you get, the harder it is to coordinate, and sometimes you just move slower, um even if you've added 10.
00:39:39
Speaker
and Often, like, three-person teams can completely out-of-the-day, so you're a 10-person team trying to do the same thing because there's less communication overhead. And so, yeah again, you know you You can only solve

AI's Role in Infrastructure and Debugging

00:39:50
Speaker
so many problems with three-person team. So again, you have to think about how to structure your teams and your company.
00:39:56
Speaker
ah But I think it comes from having... the right principles that you can apply to multiple situations. And so um so we have internal design principles and product principles as we think about how to do X, Y or Z that allow our thinking to scale even when we don't have a massive design team.
00:40:18
Speaker
Give me an example. What's one of your principles? ah Well, actually, it was you know similar to what I just described, where, again, not everything needs to be in the dashboard.
00:40:30
Speaker
I think that that's an interesting way to think about, okay, well, when does it make sense? And who are we building for? and what you know what When do we add something dashboard versus not? But another principle is um
00:40:45
Speaker
minimizing surprise. ah And by that, I mean,
00:40:53
Speaker
the dashboard or the system should do what most reasonable people expected to do. um and should not behave in surprising ways as much as possible.
00:41:07
Speaker
um And when you get down to the micro-interactions, I think it's really important to think about, okay, well, if you did this, this is not similar how...
00:41:21
Speaker
how people interact with, let's say, I mean, the most basic thing is like, this is not similar to how people interact with modals, ah UI modals. um And let's not try to change how modals work in render. Let's like make it, make that similar to what people expect from UI interactions because other applications behave in a certain way.
00:41:42
Speaker
um And so another example of this is, well, What are people used to when you think about um filters? um How should filters work and how do developers or not just, I mean, our users are application developers, but you know when when we think about filtering in all kinds of apps, we've gotten used to a certain way of filtering things.
00:42:05
Speaker
um yeah And let's not try to build an entirely you know radical way of filtering that is so different that it confuses people. Yeah.
00:42:15
Speaker
Okay. You have just given me a bridge to a thorny technical problem I want to get to behind the scenes. Of Because when I go into web interface and I say, stand up um a new Postgres server,
00:42:31
Speaker
that and especially in this kind of model, these are inherently asynchronous things, right? yeah The UI is going to have to come back and say, I'm standing that up. And eventually, what, two, three minutes later, is going to have to update and say, it's now ready.
00:42:47
Speaker
so So I think it'd be very interesting to take the path from that first click all the way through the system and back. How do you handle that messaging? Absolutely. So let's say you click a button saying create this Postgres.
00:43:02
Speaker
um From the dashboard, it's an API call to Render's API and um the API itself ah will run a bunch of checks, make sure it's authenticated and all of that and you have the right plan and um But then the API issues ah request to another service, which is effectively a another API server that is built just for the infrastructure. So you have an API server for the render control plane, and then you have the infrastructure control plane.
00:43:39
Speaker
um which has its own API server. And so that call then tells the infrastructure at a lower level what kind of Postgres to create.
00:43:49
Speaker
And it might require creation of two or three different resources, right? So as we were talking about earlier, you have a disk, you have a CPU process, you might have other ah processes like a read replica.
00:44:05
Speaker
And so that lower level infrastructure call then um is serviced by this this API server, which which then is responsible for spinning up VMs or spinning up containers um and calling the underlying resources, allocating resources.
00:44:24
Speaker
and And so the the call process ah from the original dashboard, that call returns immediately, but then ah polling process starts effectively or WebSocket process starts.
00:44:37
Speaker
um And depending, I mean, Render, we use both WebSockets and polling depending on on what it is. And um there iss we have a Go API and we use channels and channels are a good way to deal with asynchronous messages. wondering if channels or a queuing system was going to come into this.
00:44:57
Speaker
Yeah, and and we also have behind the scenes, we have our own um queuing systems and workflow systems. and ah And then there is, at the end of the day, you know there is a ah thing that is listening to changes in the underlying, the low-level infrastructure.
00:45:15
Speaker
So when disk gets created, ah it sends a bunch of messages to whoever's listening on, hey, a disk got created. and And then if that's a disk for this database, then i can signal to my api REST API channel, look, this database just got created or this disk for this database is created.
00:45:40
Speaker
And then with Postgres, actually, in addition to spinning up the the infrastructure, you have to wait for the Postgres process ah to run and to warm start up and be ready. Exactly. To be ready to ah to accept connections.
00:45:55
Speaker
And then you have to create the network network. um connections that we were talking about earlier. you have to assign it an IP, internal IP. um yeah You might have to assign it an external IP.
00:46:06
Speaker
You give it, you know, there's a bunch of network configuration happening at the same time. um But ultimately, what you're doing is you're you're listening to these events in the system um in in our core infrastructure processing code and You're responding to each event as it happens.
00:46:30
Speaker
ah And one of these events will be, okay, well, now the Postgres um process is online and is ready to accept connections. And that point, you know if you're polling, you'll get a request from Two seconds later saying, hey, is this ready?
00:46:45
Speaker
ah and And then you send that back to the user. Or if it's a WebSocket, then you just respond to the user's WebSocket saying, this is ready. But i but but i quite and that everything I described, even though it might it was long, it's it's still an oversimplification.
00:47:02
Speaker
Yeah, I can believe that. But we'll try and get an architecture diagram without actually using diagrams for a podcast. and But i i was I'm surprised you I mean, this seems like a classic persistent workflow problem with an event system.
00:47:19
Speaker
Yes. And I would have thought you can do that with channels, but you'd end up reinventing a wheel that's already out there. I think channels, we use channels also.
00:47:30
Speaker
So just to be clear, i think ah the problem with channels Is there an in-memory solution? And if the Go process dies for whatever reason, then you've lost all that state, right?
00:47:44
Speaker
yeah And so you want some sort of persistent state. And this is why you build sort of a persistent queuing mechanism where even if the thing dies, right? And so you don't want your Postgres to never be created because...
00:48:00
Speaker
the Go process died midway. and and so And, you know, so even if things fail, which is which isn't and often or even, you know, even infrequently, it's very rare. But for that rare situation, you still want it to work.
00:48:16
Speaker
And that's why i think you need to build this level of reliability, which at the end the requires some sort of persistent mechanism that, that source data on disk somewhere at the end.
00:48:30
Speaker
Did you go custom for that?
00:48:33
Speaker
um I think it's it's a combination custom um And then we're also behind the scenes um using, for example, when we talk about sort of this infrastructure control plane layer, um part of that infrastructure control plane is the Kubernetes API server. And Kubernetes has controller managers and it has its own loops going on behind the scenes, but it also like etcd, which is but the Kubernetes state store, ah that it stores data on disk.
00:49:07
Speaker
And so at the end of the day, they're yeah know you're youre past you're using Kubernetes as mechanisms ah to store persistent state and to to think about like the stick the how the world is changing with as as events happen through the system.
00:49:24
Speaker
ah But then we built our own things on top where we have a persistent queue. We use we use Redis um okay and persistent Redis. yeah um and And we make sure that uh,
00:49:39
Speaker
We have our own Go queuing background workers listening on it. um And we're also soon going to start using our own workflow orchestration product for something like this. That product is in alpha right now, but but I'm really excited to to transition more of our own usage of workflows and queuing our own needs for defining workflows and asynchronous tasks.
00:50:09
Speaker
ah being serviced by ah by um our own product that we're also going to offer to our customers. Ah, see. I was wondering if you were saying this is just a purely internal thing or so you're dogfooding it at the moment.
00:50:26
Speaker
Yeah, yeah, we're we're starting to. I mean, it's still in in very early alpha. And ah so you have to be careful about where you dock food it. You can't just put it in production for the millions of users you have, right? Yeah. ah It's sort of how your status page should not run on your own infrastructure if you're an infrastructure provider. yeah um Yeah. but chair ah But yes, our our goal is to mature this product by using it internally as much as possible.
00:50:58
Speaker
And we have so many users for it internally. so I'm really excited. Okay, I'm gonna have to ask you because I don't think we've had that many Go users on this podcast yet.
00:51:10
Speaker
But we have had like, we've had Erlang people and they would answer the questions I've just been asking you in a very different way. how So I guess the question is, how happy are you with Go? How well-suited do you think it is to what you're doing? Is there anything you're jealous of from other languages?
00:51:34
Speaker
So we're very but we're actually very happy with Go. ah And the primary reason...
00:51:45
Speaker
for my answer is um Go is today the best language to work with infrastructure ah because a lot of infrastructure code is already written in Go.
00:52:03
Speaker
And so it's kind of a network effect. Yeah. Yeah. Okay. Yeah. I can see that. And you know part of that comes from Google because Google is responsible for a lot of infrastructure code, including you know the core of Kubernetes, but also a lot of other projects. projects And Google is responsible for Go as well.
00:52:21
Speaker
and And so that created this sort of network effect where more new infrastructure projects were also written in Go. um and And it makes it very it makes it much more ahgoomic to manage infrastructure ah because you can just use Go SDKs. And a lot of these things don't have SDKs in any other language even. Or like if they are, then they're not first class SDKs.
00:52:49
Speaker
yeah But to answer question, the thing that I miss, ah i if I were to pick a language for a general purpose programming language, I'd always pick Python.
00:53:00
Speaker
Okay. Because it's... It's just faster to get things done with Python, in my opinion, because it operates at a higher level. ah Now, Python is not as performant as Go, so there's that. and And again, you have to use the right tool for the job.
00:53:18
Speaker
And if Render's infrastructure were all written in Python, it would be much slower and it would cost us a lot more because we'd be spending, you know, 10x as much on CPU and memory.
00:53:35
Speaker
Yeah, yeah, I can believe you actually it actually would for the kinds of workloads you're doing. Probably do. Yeah, because I mean, the most language obvious example is um we have this ah load balancer or router that takes every single request that comes in into render and figures out how to route it, which...
00:54:00
Speaker
which service of the millions of services that run on Render to route it to. And you can imagine performance being incredibly, incredibly critical for something like that.
00:54:11
Speaker
And by performance, I mean um how quickly you can do it, but also how much CPU it needs, how much memory it needs. ah And so... yeah right How long your thread pool going to get before you have to spin up another machine for your load? Exactly. yeah Exactly. And we serve well over 100 billion requests every month, um which means, and is these are just web requests.
00:54:34
Speaker
ah These are not even the internal requests that our services make, that user services make to other user services or their own services. And so is this just the HTTP requests coming into Render?
00:54:45
Speaker
And that's more than 100 billion. And these are served by our Go code. um And I'll tell you, I'm really happy with Go because I haven't felt the need to rewrite it in Rust or something. Okay.
00:54:56
Speaker
Just to be clear, are you saying 100 billion is the combination of people using Render and the people hosting on Render and their users are using it? It's their users, yeah. So these are HTTP requests going to all the services that all the people, all ven you know the the three million plus render users have hosted on render. So it's not API requests to render. Right, yeah. that Still, that number is kind of terrifying.
00:55:25
Speaker
That would keep me up at night. um I mean, it does keep us up at night sometimes, certainly when when there's like a page, an alert going off because something's happening.
00:55:37
Speaker
um But we've gotten to this number over time. but This is not something that, you know, happened overnight. And we've been around a while now. And um the way we've done this is very intentionally and carefully. um and you you run...
00:55:56
Speaker
load tests, you run benchmarks, um and you start to find patterns to optimize over time and you start to build, you know, kinds of caches and other things. And so, you know, this is all state that this layer needs to keep.
00:56:13
Speaker
And we built it so that even if our core Postgres or whatever the state is for customer workloads goes down, this layer keeps operating. um And that itself requires a lot of engineering on top.
00:56:25
Speaker
um And obviously there's a scaling and auto scaling part of it because our users do get massive spiking requests and so on. but But when you reach a certain scale, each individual customer spikes don't really matter as much. um You know, remember back in the day, Render hosted the infrastructure for U.S. presidential primary campaign.
00:56:56
Speaker
and this is not the presidential campaign, it's the that party's presidential primary. And one of the candidates was using Render. and And every time, there there are these primary debates.
00:57:09
Speaker
And so every time ah this candidate spoke about their platform, and at the end of the debate, there's a closing statement. And every time they delivered their closing statement,
00:57:21
Speaker
requests to their their website shot up. And in 2019, that was actually a big part of our overall traffic. And so one user could still be very, very influential to how we were doing. But obviously, these days, that's not a problem.
00:57:38
Speaker
yeah I have to say, weirdly, um the American presidential primaries make the news over here. And I think it's the only thing in politics where we hear about the political the pre-political races in other countries.
00:57:55
Speaker
They are sufficiently famous. yeah They're famous enough that they actually impact our culture too. Which, I don't what make of that, but I can certainly make that there's going lot of traffic spiking out of that.
00:58:09
Speaker
Oh, yeah. Yeah, certainly. And it it taught us a lot. It must have taught you a lot about observability, I'm guessing. Observability, certainly. absolutely Yeah. How do you build the right level of observability at every layer in the stack?
00:58:25
Speaker
That's really important. How do you do it? I mean, again, I'm wondering if you've got like queues or actors streaming telemetry out internally. Yeah, so we do have different levels of observability. ah i mean, you know, the most basic level is...
00:58:45
Speaker
at the hardware, the virtual machine level, like the CPU memory on a machine. um Behind the scenes, we do send, we use Victoria Metrics as our core ah layer that type receives all these events, observability events or events.
00:59:07
Speaker
ah checks or metric observations across but our infrastructure, our core infrastructure. And we moved to Victoria Metrics from Prometheus ah because we found VM to be much more performant okay for our scale.
00:59:25
Speaker
I've not heard of Victoria Metrics before. It's a great product and it has actually scaled quite well with us, which I can't say for most products. So if you're looking, i mean, I'm not, I don't know anyone over there and I'm, I'm not a,
00:59:42
Speaker
Uh, I'm not connected to them in any way. I just know that we've been able to do a good job scaling with their product. Um, and I know that when we we first started doing it, we were, we were using Prometheus, but now we use Victoria Metrics behind the scenes and, um, and it's, it's a cluster that it's, it's distributed.
00:59:59
Speaker
ah and, um, you also have to build out a system uh, taking logs from every single machine, but every single, ah customer process, ah because you need to show customers logs, right? yeah And you need to build a really performant scalable system to ingest logs and send them to the right places.
01:00:22
Speaker
um And behind the scenes, we're using Loki, um which is a ah log aggregation system, which again, built for scale. okay And of them we're using vector um for transformation and sending it to the right places. Loki is the thing that stores the logs and it's built by the same people who built Grafana.
01:00:47
Speaker
and Okay. Yeah. And so we use Grafana. In some places we use Datadog, although Datadog is madly expensive. so we've reduced our Datadog use.
01:00:59
Speaker
ah I've heard that criticism. and we we brought most of it in-house at this point. um and And we built our own systems across the board ah for observability. the The interesting thing for us is a lot of the observability metrics, and you know whether it's logs or traces um or just plain metrics, we have to So a lot of them to our customers for their services.
01:01:26
Speaker
yeah And so we have to capture it across again, like we millions of applications running our vendor make sure that we can do this reliably, scalably. But then we also have our own diagnostic systems that that has nothing to do with customer processes. I was wondering if you it was a single unified system that treats you the same or if you have a separate. And what's the separation of those two layers?
01:01:49
Speaker
um The most obvious one is ah for our users, we built the UI for them to see everything as opposed to just giving them a Grafana dashboard.
01:02:05
Speaker
oh Yeah. Right. and And the core of the... of the
01:02:14
Speaker
metrics, uh, aggregation layer and the core of of, the logging infrastructure, that's, that's the same. Um, and it doesn't make sense to have, you know, two different systems.
01:02:27
Speaker
Uh, but we, for example, have to, monitor more than our users need. So we we have logs or CPU or memory metrics or other kinds of things that are much lower level, like at the VM level, um at the hardware level um and the network level.
01:02:47
Speaker
monitoring is very different. um So we use NetMon for a bunch of things. But our users really, for them, they care most about um internal or like external bandwidth. So for them, it's really important to show them the bandwidth that that that is being used by the services.
01:03:06
Speaker
ah But they don't care about the internal network monitoring that we have. Do you have, is it integrated into this system or do you have another system for billing? I'm thinking like you must have something that captures how many gigabytes I transfer week.
01:03:23
Speaker
And is that, is that the same system? So the system that shows users their usage is is the same system that that is used for billing as well. Yes. Okay.
01:03:34
Speaker
Because we're we're collecting usage data in the same place. ah But then we have, obviously, we have the system that actually figures out how much to charge them based on that usage data and do other things. But the the collection mechanism is the same.
01:03:48
Speaker
Okay, yeah, that that makes sense. I was wondering if you were getting, you'd have a synchronization problem without that, right? Yeah, exactly. We have to have a single source of proof um because you have to, users, again, the element of least surprise, you you should always have the same data across multiple places in the system.
01:04:11
Speaker
Yeah, yeah, that makes total sense. So I'm going to move back into user space because There is one particular metric I saw you post on LinkedIn, which I would like to know how you do this, because I want to steal whatever your strategy is. might be brazen about that.
01:04:29
Speaker
I saw you post on LinkedIn maybe a couple of months back that you were acquiring 150,000 developers a month. correct me if i'm wrong
01:04:41
Speaker
Well, it has been corrected upwards, but I can't share. it's It's a much higher number now, but that was the last public number. Okay, that was something of a humble brag, but I love it. so I don't know how you get 150,000 or more users onto a platform.
01:05:00
Speaker
um And as I do things in the developer relations space, I would like to know, how how are you getting that kind of growth? What do you think it boils down to? So there's a lot of different factors that have contributed to this growth over the last several years.
01:05:16
Speaker
um You know, we ah we were signing up 75,000 developers a month in maybe in November of last year. And the 150,000 number comes from, think it was March or April of this year. So we doubled that number really quickly. um And this is just the new users every month. And I can tell you why that happened.
01:05:42
Speaker
Yeah, that sounds like a good one to know. Okay. So um fundamentally, we're living in this... really exciting time where it has become much, much, much easier to create and launch new applications.
01:06:02
Speaker
And there are smaller teams or individuals who are able to, and I'm talking about, I'm still talking about software developers here. I'm not talking about the people who don't know how to code, but the people who do know how to code are able to do a lot more and they're able to test their ideas much more quickly.
01:06:18
Speaker
And that, um, The bar starting something and getting it out there, ah just and from a code standpoint, is much, much, much lower now because of AI.
01:06:29
Speaker
Because if there's one thing that LLMs are really good at, it is writing code and helping you generate new code and therefore generating new applications.
01:06:40
Speaker
And Render is... arguably, obviously I'm biased, but I think Render is one of the best places on the planet to host a new application.
01:06:52
Speaker
um We make it incredibly easy for you to get started and then to scale from there. And so when people look around, they have these new applications. They're like, all right, well, I need to put it in the cloud. It's just the number of new applications being created is much higher than it was before.
01:07:06
Speaker
And that has led to more people using render and other things like it. So it's not, I'd say that doubling or more than doubling ah is not just applicable to render. and It's applicable to others as well. um Obviously, they might have smaller bases.
01:07:24
Speaker
um So they might go from like, i don't know, whatever it is. But I think for us, that growth continues, ah which is why, again, our number is much higher now than it was. Okay. and And so there's just many more new applications in the world and Render just happens to be ah really great place to deploy new workloads.
01:07:46
Speaker
Okay. yeah Yeah, I can definitely see that AI is creating more services because I've created a bunch of side projects in the past month that I wouldn't have had time to get round to before AI.
01:07:58
Speaker
Exactly. it and And I have actually seen it try to deploy things to AWS and it has not gone well, even with the AWS CLI installed. Are you, though, you must be, trying to, like, somehow seed chat GPT or Claude or whatever with knowledge about your company and you must be leaning into how can we make...
01:08:26
Speaker
how can we You must have done a lot of work for making it easier for human beings to use Render, but are you now actively doing work to make it easier for LLMs to use Render? Is that ah is a marketing strategy?
01:08:38
Speaker
Well, it's actually ah more of a product strategy. um As we think about the next 12 months, how do you make Render easier people? not just people who are building with LLMs, but for agentic workloads themselves.
01:08:56
Speaker
So agents need to spin up new kinds of compute um as they perform a series of tasks. um How does Render become the place, the default platform for agents and humans to work together ah to build these systems?
01:09:11
Speaker
ah But to go to go back to marketing for a second, no, we haven't actually done anything to influence our chat GPT rankings. I wish we had a clear answer.
01:09:23
Speaker
um So we don't really have a marketing team. We just have one a marketing leader who joined six weeks ago and she's building out a team and she will build out a team and we will obviously work on making rendered more visible to LLMs and obviously to Google. Again, we haven't done any any SEO work either.
01:09:45
Speaker
ah But the good thing is that when you have developers who really love your product, then they do that work for you by... and not It's not work, but like they developers like telling each other about great products.
01:09:59
Speaker
I think that is just sort of the fundamental truth. And it's almost somewhat consumery in that sense ah versus, you know, B2B products, B2B SaaS things like people using B2B SaaS are not necessarily going around telling people, oh, you should use B2B SaaS. I'm really excited about it.
01:10:14
Speaker
But developers like, oh my God, I found this really easy and you should use Render. Yeah. And so we see a lot of that happening on social media and on blogs and in Slack channels.
01:10:27
Speaker
And LLMs ingesting all this data. And LLMs are nothing if a reflection of what people are saying about something on the Internet.
01:10:38
Speaker
And so because there are so many render users already talking about render on on the internet, LLMs are also ah recommending render as the place to deploy your app.
01:10:51
Speaker
Okay. Yeah. Yeah. i So ah I don't know how you replicate that, but i can see why you'd be benefiting from it. Totally.
01:11:02
Speaker
Do you... Do you wonder feeding into that? I mean, I can see a future where companies like Anthropic get more savvy to this and say, hey, Brenda, we will charge you $100,000 a month to make you the platform that our AI tends to recommend for deployment.
01:11:26
Speaker
I think that's a really tough question. question to answer for companies like Anthropic or anyone building this foundation model, because one of the reasons people use foundation models today ah is that they think that they're getting better answers than Google.
01:11:45
Speaker
These things haven't been gamed. and They don't have sponsorships or sponsored posts. Yet. You and I, but yet, right? But you and I would trust in LLM a lot less if we knew that it was striking these deals.
01:12:00
Speaker
um And that's a challenge for LLM providers. Yeah. How can you maintain customer trust?
01:12:11
Speaker
ah With Google, at least, you know, obviously different now, but back in the day, Google always had the sponsor post thing, which is very clear. And so there is a world in which
01:12:24
Speaker
I think LLMs build something like what Google did back in the day, where there's a sidebar with ads. And obviously those ads made it to the top eventually. And now they're virtually indistinguishable from search results.
01:12:38
Speaker
Yeah, I can totally see that future. I'm not sure. I think we'll have to see how it plays out. I doubt we can rely on there the goodness of their hearts, but we might be able to rely on economic forces making it bad business to behave that way.
01:12:54
Speaker
Yeah, I don't think they rely on the goodness of anyone's hearts when it comes to business. Sadly. And you're right in the heart of it in Silicon Valley, so I'm sure you would know.
01:13:06
Speaker
oh and the stories I could tell.
01:13:10
Speaker
Okay, I have one last question. Again, it's about um ah unless you want to tell stories, perhaps you there are stories you want to tell or perhaps no you feel... No, no, I'm going to keep this simple.
01:13:23
Speaker
Okay, okay. We'll save that one for one day we'll meet at a bar sometime and you can tell me all the horror stories. That sounds great. Okay, so there's another thing I've been thinking about and it relates to LLMs and the way we're changing how we program.
01:13:37
Speaker
Do you think we're going to be moving to a space where a provider like you also needs to support LLMs for debugging? Yes, absolutely.
01:13:48
Speaker
um I think if there's one thing, again, LLMs are good at, it's ingesting large amounts of data and trying to find patterns in it and surfacing those patterns.
01:14:00
Speaker
um And debugging at the end of the is exactly that. ah Humans looking at lots of different data from different places. It's logs, traces, profiling data, um looking at and even coordinating it with the source code um and looking at abnormal behavior, abnormal abnormal requests.
01:14:23
Speaker
ah All of that data and and should be fed into an LLM with the right kind of context as and serve as the right kind of context with the right kind of prompts to help developers find that needle in a haystack, what went wrong.
01:14:43
Speaker
um And, you know whether Render builds it properly into our system from scratch or whether we partner with someone or, ah you know, if it maybe their companies, i mean, Datadog has an AI SRE now and Datadog,
01:15:03
Speaker
but could effectively ingest all this data and does ingest a lot of data from Render already today. And so if you're using Datadog for a bunch of other things, you could use the AISRD within Datadog for it. But I do think AI debugging um is going to become almost the default once these tools get better.
01:15:24
Speaker
Yeah, I can see it being first-line support very easily. And I can see there being demand for your telemetry systems to feed into, I don't know, ah RAG system or for you to pre-chew prompts about failures.
01:15:39
Speaker
Yes. Yeah, absolutely. i think... Render is actually uniquely placed um as an infra provider to have all the data that you need for debugging.
01:15:53
Speaker
ah Because guess what? When something goes wrong, we're the first ones to know. And... and And we, because Render works directly from when you push something to GitHub, and then we take that all the way to production, you know, Render has access to ah the entire lifecycle of your application.
01:16:13
Speaker
And we have ah your commit data because we deploy on every commit. And, know, with the, with obviously with, with, with the right kind of user permissions and making sure that this is something that users want us to do, um you could build a system that looks at all of this data for users and um helps them with this first line of defense, like you said, yeah to give them a ah sense of what's going on.
01:16:43
Speaker
Is that something you're actively working on?
01:16:47
Speaker
um I can't speak to ah to sort of future products. Oh, I thought I'd load you into a full sense of security there and I can get away with it. I think it's something that we're, I'll say obviously clearly we're actively thinking about.
01:17:05
Speaker
you might i mean, yeah I just don't know how... Okay, that's ah maybe that's a question for someone else one day, but obviously there's a vast amount of data from the GitHub commits to the load balancer to the database logs, and you've got it all their way. Exactly.
01:17:19
Speaker
And I don't know how you boil that down, but I can see how useful it would be for when my app crashes. Yeah, it's also really useful within Render because you have these interactions between different services. Like you said, there's a database, there's your, let's say, website backend, but maybe there's a third internal service that that website is talking to.
01:17:42
Speaker
And then you're making these external API calls. ah And so Render has essentially all of that. And not just Render, I mean, AWS does too for AWS users. um And all the cloud providers will have this sort of data.
01:17:56
Speaker
And...
01:17:59
Speaker
The question is and
01:18:03
Speaker
do you rely on something like a Datadog ah to specialize on building this tool, or do you build one internally?
01:18:15
Speaker
and I think that the answer is probably both, ah because for Datadog, you you probably have, and increasingly people are becoming multi-cloud.
01:18:28
Speaker
So people have systems from multiple cloud providers. And I think one of the reasons Datadog or any ah observability company ah can be successful is because they have integrations with all these cloud providers, multiple ones and and systems, and they can give you this unified view.
01:18:45
Speaker
ah At the same time, i think Render has better information than Datadog will ever have about your systems running on Render. So I think it's Render's responsibility to also do this work at some level.
01:19:00
Speaker
But then I think Datadog could do this across, let's say, a different level. um And so that's why the answer would be both. Yeah, yeah, okay. Yeah, I can see that. For different reasons, I think you'd both be crazy not to be working on that at the moment.
01:19:15
Speaker
Yeah. Okay. So, so speculatively in the future, what do you think the future will look like? Am I going to find that one day i get, um, I get a message from my system saying the web server has crashed and here's what we, here's our diagnosis of why, and here's what you should do. Would you like to click yes?
01:19:37
Speaker
So Renner already does about 90% of that, except for the would you like to click yes part. We don't really like send you an email, a Slack notification when things are down and we give you sort of the exit ah process code and the the log message. We can link to the log and you can see what's going on.
01:19:57
Speaker
but But yes, I think ah we also start... there's already systems that self-heal. ah They're already, you know, when things go down, you spin up a new version of the app. And Render already does this where, like, it actually has the ability to restart your app automatically if it's unresponsive.
01:20:17
Speaker
um I think what we'll see is an increasing... ah intelligence being built into these systems. And with that increased intelligence, maybe more complex failures than simply your app being unresponsive.
01:20:34
Speaker
More of complex failures would lead to more automated actions. um and And a really simple example of that, a procedural example of that is, oh, I'm looking at my queue length And if I'm running these background workers that are working off of a queue, if my queue length is beyond a certain point, if I, you know, I want to make sure that my queue never exceeds like 10 items.
01:20:57
Speaker
ah But if it's at 11, spin up a new worker automatically and and I start reducing my queue length. I think that we that's the most basic example today, but I think we'll see an explosion in the kinds of things that can then be automated, whether it's scaling, whether it's alerting,
01:21:19
Speaker
ah AI will and and can help us ah help infrastructure be more autonomous. And I think we're moving towards a world where infrastructure is and can do a lot more autonomously.
01:21:36
Speaker
ah It won't do everything, but it'll do more, much more than it does today. And that's also the future that we want to build with Render, autonomous infrastructure. um I think that is really truly the the future that we want to live in.
01:21:48
Speaker
ah Do you think you'll sleep better at night with that in place or worse as the person responsible for it? My sleep will ignite no matter what. And I think to me, it's actually really exciting that we're in this world where these tools are available to us that can do more and that are smarter.
01:22:06
Speaker
And obviously you have to build it in a way that doesn't have things like false positives, and that don't take the wrong actions. So there has to be ah ah very stringent level of control over what you allow ai to do autonomously.
01:22:21
Speaker
and And there has to be a human in the loop often. But for the human in the loop um to start with, they'll get a lot more information front that they would have to spend like an hour or two trying to obtain today.
01:22:33
Speaker
Yeah, yeah. I would have thought just being able to pull together the Git logs with the the deployment and crash logs. I mean, these things thrive on good context. I would think you're in a very good place to pull that all together.
01:22:46
Speaker
I look forward to seeing what you do with it over the next year, which is going to be a very busy year, I'm sure. It will be, and I'm excited about it. Cool. I'll leave you to go and build it. Anoreg, thank you very much for joining me.
01:22:59
Speaker
Thank you, Chris. This was great. Thank you, Anorag. I'll tell you one quick fun story before we go. So I was trying out REND. was kicking the tires on it, unsurprisingly, and just hosting something small, small website.
01:23:13
Speaker
I committed a change to the code and I pushed it and I'd set up their MCP server. So I said to Claude, which I've been using a lot lately, has my change gone live yet?
01:23:25
Speaker
And it whirs away and queries their endpoint and says, no, it's not gone live. And I've looked at the logs and it looks like your most recent commit introduced linting error, which caused the GitHub action to fail.
01:23:39
Speaker
Here's the diff that would fix it. Would you like me to commit that change and push? And I said, yeah, that sounds great. And it whirs away again. And after a few minutes says, yep, that change has now been pushed and it's live on the website.
01:23:55
Speaker
And here's the ah URL. What a future to live in. And then two minutes after that, I get an email from GitHub saying that the first deployment had failed. Way too late. Way too late to do anything about The way we provision things and the way we deal with diagnosing errors and the whole world of frontline support, it's all going to change.
01:24:17
Speaker
Email notifications are going to look completely vintage. These are crazy times to be in. We best go and figure out what the crazy times look like. So before you go, if you've enjoyed this episode, please take a moment to like it, rate it, share it with a friend, share it on social media, and make sure you're subscribed because we'll be back soon, of course, with another interesting voice from the software world.
01:24:40
Speaker
But until then, I've been your host, Chris Jenkins. This has been Developer Voices with Anurag Goel. Thanks for listening.