Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
/resilient systems: fail-proof by design image

/resilient systems: fail-proof by design

The Forward Slash Podcast
Avatar
90 Plays1 month ago

Can your software handle failure—or write its own test code? In this episode, we dive into temporal workflows, agentic AI, and no-code automation tools like N8N to explore how developers can build resilient, scalable applications faster. Learn how to navigate unreliable APIs, implement AI-powered code generation, and prototype with no-code platforms to supercharge your software development process.

Recommended
Transcript

Exploring N8N and Personal Automation

00:00:02
Speaker
After about three weeks of messing around with N8N, I would probably say I fell in love with it. I deployed my own personal instance of it to my Kubernetes cluster and I i use it for all kinds of random personal projects. I built a little crawler that watches ah a a hotel that my wife likes that's nearby, but it's always booked out a year in advance.
00:00:27
Speaker
So I have a crawler that notifies me when rooms become available at this hotel.

Introducing Austin Karpis and Batove

00:00:39
Speaker
Welcome the forward slash podcast where we lean into the future of IT, inviting thought leaders, innovators, and problem solvers to help slash through its complexity. Today, have with us Austin Karpis.
00:00:52
Speaker
Austin Karpis a developer at Batove with a passion for building thoughtful, maintainable software. He's all about creating great user experiences and solving tricky problems with clean, effective code. When he's not deep in a project, you can find him sharing thoughts and side projects over at austinkurpus.com. That is
00:01:15
Speaker
Thanks for being with us today, Austin. So you work for Batovey. They're a consulting company similar to Caliberty, right? Yeah, that's correct. I've been with Batovey for a little over eight years now.

Backend Management and JSON Formatting

00:01:25
Speaker
And you guys are, and you're, you're out of Chicago area. Yeah, we're completely remote, but most of the board happens to live in Chicago. So that's where the the company address is. But, um, even the board is, is fully remote.
00:01:38
Speaker
Nice. Nice. I love Chicago. Great town. Uh, so you, you, you work as a, on, on a lot of backend systems. Is that correct? Yeah, I am one of the team managers for the backend department. So I focus primarily on backend systems these days. I try to try to stay away from the front end if I can avoid it.
00:01:58
Speaker
Yeah, it's it's unfortunate. Like for me, I've always i had always been a backend of, oh, I don't know if it's unfortunate. It's kind of fun actually. But like when I do side projects now, it's kind of boring just looking at JSON strings all the time. So i I do have to kind of, you know, dip my toe in doing some UI stuff every now and then. I'm not terribly great at it, but...
00:02:17
Speaker
it's fun, I guess, every now and then. It's kind of kind of fun to actually see something come to life on a screen instead of just some text scrolling by. come on. JSON's great to look at. I think we should replace UIs with just a bunch of JSON.
00:02:29
Speaker
Now, I'm no slouch. I do pretty print the JSON. You know what I mean? like i I do that. So it is it is very nice looking when it's scrolling by, just like I'm in the matrix.
00:02:41
Speaker
All right, but here's the question. Do you pretty print it with two spaces, four spaces, or a tab? but I think it's two spaces, though. All right, that's yeah that's right. That's good. Two spaces is is is the ah is the right answer.
00:02:54
Speaker
It's definitely not tabs. I mean, come on. This is 2025. Are you kidding me? Yeah, think we won that battle. Exactly. All right. So ah back-end system. So that's um and now are you writing like APIs as a service or like API products or are you doing like

Agentic AI and No-Code Tools

00:03:14
Speaker
integration type? what What do you get into these days?
00:03:16
Speaker
For the majority of the last couple of years at Bitovey, I've been consulting in Temporal, um doing a lot of ah lot of different stuff, some of it related to agentic AI, some of it related to sort of B2B products and things like that.
00:03:32
Speaker
ah More recently, I've been mostly focused on consulting ah in the agentic AI realm, specifically um ah specifically building AI agents for some of our enterprise customers using no code tools.
00:03:48
Speaker
um So mostly focusing on sort of prototyping, ah using tools like N8N and then longer term, I'm sure we'll move those out to a more durable solution like Temporal. But that's that's kind of where I've been for the last eight months or so is is really just building AI agents.
00:04:05
Speaker
Tell us a little bit about what that is and why why do why do you choose to use that? Why is that a an effective tool for what you do? Yeah, great question. So Temporal is a durable execution runtime and and platform that allows you to build resilient applications without thinking too much about the redundancy factors and and what happens when things fail.
00:04:28
Speaker
temporal is you know fundamentally basically just an abstraction layer over a queue. um But it's a really, really nicely designed abstraction layer that is built for a lot of um ah fault tolerance or built to solve a lot of fault tolerance edge cases, essentially.
00:04:44
Speaker
um So you can really focus on building your application code, writing your your business logic, and you don't have to think about what happens if there's ah some sort of intermediate failure.
00:04:54
Speaker
Maybe a worker crashes, maybe there's an intermediate network failure. um you you spend very little time thinking about those failure scenarios when building things with Temporal as opposed to other solutions.

Temporal and Application Resilience

00:05:07
Speaker
the workers, the things that actually do the the activities that need to be done and orchestrated, those you kind of control. You run those workers yourself on your own you know hardware or in the cloud or whatever.
00:05:18
Speaker
And then they kind of phone home back to temporal and that's temporal kind of takes care of like that state management and it knows I've got, I've got to finish this task and then go on to another one in this, but it manages that where am I along the journey kind of, kind of stuff. Right.
00:05:32
Speaker
Yeah. The, the temporal service um essentially orchestrates everything and then you're responsible for building and deploying your workers. Um, so you, you manage all of the code, you, you manage the actual sort of application runtime.
00:05:46
Speaker
Um, you build the worker using their SDK, which communicates with the temporal service to orchestrate everything. And then, um, that actually, um provides a, um, a runtime that, uh,
00:06:01
Speaker
that offers durable execution. So essentially, as things happen, especially ah non-deterministic things happen in your workflow, Temporal is keeping track of the state of everything as it happens.
00:06:12
Speaker
And so if there's some sort of an intermediate failure, even if a worker completely crashes and goes offline, Temporal is actually able to, or the worker is able to use Temporal, I should say, to replay everything that's happened up until the failure to recreate the internal application state.
00:06:27
Speaker
um There is a few different primitives in temporal that you need to be careful about building ah around like um non-determinant ah non-deterministic ah functions for example um and there's there's specific ways that you need to communicate with a worker and and query data from a worker but as long as you're using all of those temporal primitives to to take these actions then the implementation is fundamentally durable it can make handle any kind of failure and pick up where it left off no problem the only real caveat there is things that need to be item potent so maybe like a payment system you want to make sure that the api you're using is implemented in an item potent way you do have the possibility of you know running uh an action twice if say something completed and then the worker crashed but it wasn't able to update the state and temporal
00:07:17
Speaker
there is a possibility that it will replay that action or, you know, rerun that action instead of um using previous state. So for things that are that are highly sensitive, like payments, you want to make sure to still use item potency. But other than that, you really don't have to spend too much time thinking about failure scenarios.
00:07:34
Speaker
And idempotency is kind of means like if if you do the same thing multiple times, there's only ever going to be the, basically the one outcome. And that's kind of the gist of it. Right. So like you're saying for payments, I don't charge somebody's credit card over and over because I had to replay the thing. i know I already, I charge them so I don't charge them multiple times.
00:07:53
Speaker
Okay. correct Very cool. Yeah. That was the thing I was going ask about would be a, I didn't put and see that's, that seems like that would be very important. Now are there, you know, not, not to necessarily name names, so to speak with clients or any, any sort of thing like that, but the types of problems where you you've thought, you know, wow, temporal is really saving my bacon here. I'm so glad we had that. And you know, types of problems like that, that you can remember.
00:08:17
Speaker
Yeah, I can remember many times where where the same basic scenario happens on on you know various clients over the years. But you know fundamentally, a situation would occur where there's some kind of ah ah failure in production. We have workers that are you know blowing up.
00:08:36
Speaker
the The workloads are are not completing. The queue is is stacking up with with work. And in a traditional system, maybe if we had used SQS and and had workers consuming out of that queue and and things, if there was some kind of failure, the way you would typically handle that is maybe you retry it a few times and then eventually you give up and you put it in a dead letter queue. And now you have have all of this stuff to clean up. And maybe you have a workflow for cleaning that stuff up. Maybe you have manual processes.
00:09:03
Speaker
But you you really need to think about, like, how do I get this work back into the the queue to be consumed? And how do i figure out the state that everything is in? So with with Temporal, you can just deploy a change to your worker.
00:09:16
Speaker
And when the worker comes back up, it'll pick up where it left off and it will resume execution um ah you know up to the point where we're it has completed things.
00:09:27
Speaker
And if you implemented the fix correctly, your workflows will just complete. right there's There's no operational work. There's no cleanup. as long as you set your retries and and timeouts and sort of exponential back off in a reasonable manner, and you're able to deploy a fix in a reasonable amount of time, your workflows will just keep on moving.
00:09:50
Speaker
um And of course, There are some cases where where this is not a good thing, right? Maybe there are are cases where you're um you're working with highly time-sensitive data.
00:10:02
Speaker
And so you know you need to have a more immediate failure and and recognize that and not keep that work in the queue. And there's also ways to do that inside of Temporal. But for the situations where it's a matter of processing what's in the backlog, it's work that needs to be done at some point.
00:10:19
Speaker
you can deploy a fix, and you can watch your your backlog start to drain. It's a really powerful um tool for being able to solve problems in production like that.
00:10:31
Speaker
That is cool. Yeah, I know it's funny, like a lot of those integration projects I've been on in the past, you know, it was like, okay, we'll just throw that to the dead letter queue. But there's no plan of like, what do you do with that stuff? Like replay is not exactly completely straightforward all the time, right? Like you're saying, there's a lot of different things that need to go on. Like there could be a bad record that's bad right now and I need to get a software fix out to to be able to address

Challenges with APIs and Prototyping

00:10:55
Speaker
that.
00:10:55
Speaker
But there's things stuck behind it that I can process right now. Those sort of scenarios are are weird. so Now, those failures you were talking about, you know, is that like, you know, those are sometimes they're they' temporary failures. ah Is that I think we talked about like in the prep call, like third party, you know, services or off the shelf products where you're trying to integrate with those, those can be problematic. And also a little bit about your experience with that.
00:11:25
Speaker
Yeah, so I've definitely spent a lot of time over the years integrating with with third-party software solutions. And you know sometimes APIs are built well, they're documented reasonably well.
00:11:39
Speaker
um you know It's I've had those experiences where it's it's actually honestly like a pleasure to integrate with someone's API. And I'm just thrilled with the experience. I'd say more commonly, it's not so ah it's not so fun. It's not so painless.
00:11:55
Speaker
um I can think of a particular time early in my career when i was working on a SaaS product that offered um products, sort of like back in stock notifications for e-commerce stores, right?
00:12:06
Speaker
ah So like there was no software as a service product out there at the time that did this. um I think there was, think Magento had like some plugin you can get, but like the majority of of e-commerce platforms out there had no so native support for it, no plugins for it, no SaaS solution that you could just drop on your site.
00:12:24
Speaker
So we were trying to build that. And we decided to target a particular um e-commerce product that we ah we knew a lot of merchants that were very interested in this service on that platform. We had really good relationships with them already. So um it seemed like a good ah good product to target.
00:12:45
Speaker
That ended up being a really terrible mistake. It was really easy to get beta testers and even get our first few paying customers because of the relationships we had with merchants on that platform.
00:12:56
Speaker
but ultimately their api proved so ridiculously unreliable and broken that it destroyed our product we we could not offer a reliable experience around their api because so much of it was just fundamentally broken things like rate limiting didn't work they would start denying your requests and they would say that they would send you a header with a with a time delay for you know the how many seconds you need to wait for your request bucket to refill or whatever.
00:13:24
Speaker
And that the header would never change. It would always just say zero seconds, even when they're denying your requests. There was little things like the the data structures they would return were inconsistent. Some merchants you would have like some um some value would be a number inside an array and some other merchant would just have the value in no array.
00:13:44
Speaker
Oh my goodness. It's like, what why is this inconsistent between merchants and and things like that? And so there was, it was just constant fighting their API, trying to build our like custom rate limiting stuff that would do like exponential back off. And we started seeing failures and the endpoint was so unreliable that we would make the request to update the product.
00:14:08
Speaker
And then we would check again a few seconds later to see if it was updated. And if it wasn't, we would try again. And if it didn't work that time, we would we would fail and say, sorry, it's not working right now to the end customer.
00:14:19
Speaker
Come back later. So like just trying to build this product and like make it integrate deeper with their platform was just like pulling teeth the entire way. And it ended up killing the project.
00:14:31
Speaker
Oof. Man, mean, that's beyond any resilience patterns that, you know, like you said, you already tried retries, you already tried, you know, these sort of things, the circuit breakers, and that's not gonna work, right? In that situation, that's crazy.
00:14:43
Speaker
I did have ah an internal product when I worked at a client one time, they, i was just kind of asking for documentation on the on the API, like, well, what does this return? What is this the shape of this response? And they were like, well, just go read the code.
00:14:56
Speaker
So I did, because I'm a developer. And that didn't really help. It did the same thing you were talking about. There were times when a field would be a scalar value, just one single value. And then there were times when it realized, oh, I need to be an array and it would transform it into an array and rewrite it. There was no actual shape of the response, no object that held the shape.
00:15:15
Speaker
They just made it up and and kind of munged in things as they needed to. So that was, I've had similar experience. That's not fun. Okay. ah you You mentioned, i do want to hear about N8N. And is that, is it Natan?
00:15:32
Speaker
Or is it like, why is it called n eight n It is technically called Notimation, short for node automation, I guess. It's like a node-based or graph-based um sort of workflow builder tool.
00:15:49
Speaker
ah It's categorized as a no-code tool, although it gets a little fuzzy because you can have... um nodes that that implement custom code.
00:16:00
Speaker
And you can also build custom nodes from the ground up that that you can sort of just drag and drop into the UI as if it's any other native node. So there is sort of the ability to write your own code and and and get your own code into the workflow in some senses. But for the most part, you don't really need to write code.
00:16:18
Speaker
It integrates with a lot of third-party platforms out of the box. I think they have something like 400 integrations. um It definitely has its limitations. I certainly,
00:16:29
Speaker
um i would be very careful before trying to build a product around it. I, I don't know. um there's There's probably situations where it would make sense, but for the most part, if you're trying to build a product and you're shooting for like large scale and, and sort of multi-tenancy and all of that stuff,
00:16:50
Speaker
then you probably want a more robust solution. But if you're building prototypes to prove concepts and you're wanting to move really, really quickly and you're building maybe one-off business tools or even internal company tools, I think it's a fantastic thing to to keep in mind. Like it's, it's, um,
00:17:09
Speaker
you know It's just another tool in the belt, essentially. Not always the right solution by any means, but it has immense power. and It's kind of one of those things where I was a little reant reluctant to use it at first because like I'm a nerd. I love writing code.
00:17:25
Speaker
I already don't write enough code being a manager now. And i was, you know, offered this, this project working in N8N for a while. And i was a little reluctant at first, but after about three weeks of messing around with N8N, I would probably say I fell in love with it. I deployed my own personal instance of it to my Kubernetes cluster and,
00:17:48
Speaker
i I use it for all kinds of random personal projects and and just quick, you know, I want to notify myself when something restocks on some website.
00:17:59
Speaker
Ironic, we're going back to stock notifications, but yeah, sometimes I do custom stock notifications for sites that that don't have it. I built a little crawler that watches a ah hotel that my wife likes that's nearby, but it's always booked out a year in advance.
00:18:14
Speaker
So I have a crawler that notifies me when rooms become available at this hotel. And it's, it's actually worked for us before and gotten us, uh, uh, hotels under on like short notice and stuff like that. So it's a really great tool for just like hacking things together.
00:18:29
Speaker
um I highly encourage people to use it, even if they don't think they have like a, um sort of enterprise use case for it. If nothing else, it's a great tool for person. I was, I was going to ask, like, I didn't know that that you were using it personally. That's really cool.
00:18:42
Speaker
That's adequately geeky. I love it. Okay. and um I'm definitely have to check it out. You are not the first person I've heard talk about this tool. So it's definitely popped up a few times in conversation. So I'm going to have to give it its due.
00:18:54
Speaker
And I think even like in the enterprise, these sort of tools, they they they do have their place or they can have their place. And it's those and like the the swivel chair integration, right? So I turn on inner data in one system where I go look something up in one system and then I swivel my chair over to another one and I...
00:19:11
Speaker
And put data into another one. Those are great candidates for these type of automation tools where you don't have to write code, but it's, it's monotonous work or tedious or time consuming, but you know, you're not doing it a gazillion times a day or anything, but it, it'll save you time for sure.
00:19:29
Speaker
Sure. Yeah. Especially one-off tasks. It's like, you know, oh, I want to get all of my sales proposal emails loaded into Google docs. It's a, you know, a one-time process.
00:19:40
Speaker
We're going to use Google docs to store this going forward. So it's not going to be a problem. Throw the workflow together and edit and run it and then throw it in the garbage. It took you maybe five minutes to throw it together and you never have to look at it again. Like no real engineering effort there.
00:19:54
Speaker
you You don't even need someone super technical to do it. Check it out. So you mentioned that you've, you know, lately been building agentic workflows and we do hear a lot about that in the market right now. That's a, that's a hot topic.
00:20:08
Speaker
Usually ah paired with, you know, AI, you know, AI agents or agentic, you know, multi-agent AI workflows, but agentic workflows themselves aren't necessarily ai Is that right?
00:20:22
Speaker
I think that pretty much all agentic workflows have an element of AI in it. I mean, well, let's back up a second here. So I think that there's there's a lot of ah marketing fluff around agentic AI.
00:20:39
Speaker
ah There's a lot of marketing fluff around AI in general. um I define um agentic AI as as some sort of um some sort of an automated system that is goal-oriented and is um iteratively working towards towards that goal, right?
00:21:01
Speaker
You could... possibly create something that vaguely fits that description that does not use AI. But I think it would probably be very hard.
00:21:12
Speaker
Like most most things that don't use AI are are more task oriented, right? Like most problems that we solve with traditional programming, we we think of as as like being task based, right? or Or maybe it's like a collection of tasks.
00:21:27
Speaker
An AI agent receives a goal, right? And that goal is is not necessarily a task or a set of tasks. It's it's a broad, you know, you're you're here and I want you to be here.
00:21:39
Speaker
Figure out how to make it happen, right? And the agent is able to to look at the problem and break it down into steps and then iterate on those steps until it comes to a conclusion, hopefully.

AI in Automation and Code Generation

00:21:50
Speaker
Right. And so like, that's, that's what I define agentic behavior as. And I don't think that at least I haven't seen any examples that I truly think fit that, that aren't using AI under the hood.
00:22:02
Speaker
Um, you know typically you need some, some intelligence under there, not just. Yeah. To have the kind of autonomous, you figure out how to get from, like you're saying, from point A to point B, absolutely. That's going to require some sort of smarts to it. Um, some artificial intelligence. so Um, what I was thinking about is like, I know folks get kind of caught up in that when, when they're built, I mean, cause we've been doing agentic stuff like, like event driven systems are, are, are are somewhat agentic. They're not, you know, they're not figuring out how to, to, to do the workflow on their own, but they're, they're processing things and they are autonomous and then, but then they're purpose built for a specific task. Like you were saying, think a lot of people get caught up in,
00:22:40
Speaker
everything has to have AI baked into all the little pieces of the, of the workflow. So like you're saying, yes, that, that orchestrator, that, that thing that's figuring out how to get from a, you know, point A to point B, that absolutely needs to be smart, but the tasks along the way, like, Oh, go insert this a user into a database that can be very straightforward and hand coded type of a thing. Right. So those, those other agents that work together or it can enlist to get the job done,
00:23:07
Speaker
Those don't necessarily have to be, they can be very purpose-built and don't need AI to write a record into a database, but you do need one to kind of ah figure out the flow and how how am I going to get from A to B? So yeah, that's cool.
00:23:21
Speaker
Yeah, absolutely. Yeah, I would say that most of the agents that we're building currently, I would i would describe them as being a hierarchy of of workflows, essentially.
00:23:33
Speaker
um At a top level, you can think of the entire thing as being an agent. And then underneath that, you may have intelligent automations, you may have sub-agents, and you may have traditional automations or or, you know, more broadly, tools that these agents are interacting with, right?
00:23:52
Speaker
And you really, at at least given the but the technology that we have today, the the level of intelligence that we're we're currently able to derive from these models, you have to build a hierarchy of of these different workflows that that sort of come together with some sort of an orchestrator, in some cases, multiple orchestrators.
00:24:12
Speaker
to to actually complete the task. And and there's you know many pieces of that that are are sort of you know traditional automation or or like deterministic um functionality.
00:24:23
Speaker
um I think really interesting example of this actually is on my current client, we're building AI agents to basically write end to end tests in their testing lap for them. So the agent reads a ticket and it outputs code.
00:24:41
Speaker
So the really interesting thing about um one of, one of these agents that we've built that generates code is the code generation itself is completely deterministic, which really throws people off um the analysis of the ticket.
00:24:56
Speaker
is where the intelligence is. Right. So we, we, we analyze the, the, you know, human words or the, the English words in the, the ticket and, and understand, you know, what they're asking for and ultimately translate that into like the domain knowledge and all of the context needed.
00:25:11
Speaker
And then we we take all of our analysis and we turn it into a JSON structure that basically represents all of the things that we need in a um in a very um deterministic and and highly specified format.
00:25:28
Speaker
And then we take that JSON object. You can almost think of it as an AST. um It's a little bit it's it's more nuanced and it's kind of like ah domain specific, but it's it's very similar to like an app abstract abstract syntax tree.
00:25:40
Speaker
And then we pass that into a template engine. which then writes all of the code completely deterministically. um So the the analysis takes like a few minutes on the ticket side. And then to generate the code, it takes like a few milliseconds. It's like almost instantaneous.
00:25:55
Speaker
um So it's it's really interesting because a lot of people would expect that the intelligence is is probably more on the other side of things, on actually writing the code. But in this case, we we took the approach of of using all the intelligence to understand the the the non-determined as to us non-deterministic aspects of the problem and convert that into a deterministic machine-readable structure that we can convert into code.
00:26:18
Speaker
And in some cases, this approach doesn't work, um but we've actually had really good luck with it another approach era in other cases. I think it it demonstrates really well that that to create agentic behavior, you can have great success by mixing traditional and and deterministic sort of logic with this intelligence. or Yeah, that seems to be a common pattern. I mean, especially with generative AI, one of the great places of where you you can plug that in is kind of being that...
00:26:46
Speaker
um unstructured to structured bridge like you're doing. So I give you some natural language and it says blah, blah, blah, whatever in the ticket. And that's just people typing things, humans typing things. And then you use the kind of an LLM or generative AI to interpret that, understand the meaning, provide semantics to it so it knows what's going on. And then you tell it, here, do some structured output for me and and you know make some sense of this world and give me some structure to this.
00:27:14
Speaker
Yeah, it's a fantastic pattern.
00:27:19
Speaker
And in the case of of the agent I was using as an example, the target is using a ah like a in-house built testing framework that our that our client built themselves years ago.
00:27:32
Speaker
So it's it's almost like a a DSL built in Python. um mean and That makes it sound more complex than it is. It's more like a subset of Python, really. So it ends up being something that we can represent in a machine readable format quite easily.
00:27:49
Speaker
That's why I said it's it's similar to an AST, but it's it's more nuanced and more like domain driven. um But it's it's really, in this exact case, since it's like such a ah subset of Python that we're targeting and and the structure of it is so defined, it ends up being really easy to build a deterministic template generator and and and sort of front load all of the processing. But in other cases, we actually do need to make the the LLMs understand the code in the context and actually have the LLM produce you know some Python It's usually pretty good at that, especially you given examples. you know Do it like in a one shot, few shot. It does a pretty good job of that most of the time.
00:28:29
Speaker
Yeah. It depends. It depends. I would say like things like Claude and chat GPT are really good at it at this point. I've heard Gemini is really great too. I haven't used Gemini for code all that much.
00:28:39
Speaker
I know i need to to give that a try. Where we're working right now, the client is so security oriented, we'll say, and We cannot use any cloud hosted LLM

Hosting AI Models and Optimizing Performance

00:28:52
Speaker
providers. So we actually have to host all of the workloads in their in their lab on premise.
00:28:56
Speaker
um So we are sort of limited to using open source models and we're we're very limited on compute at the moment. they're They're working on spinning up more compute for us, but we're we're limited in the size of models that we can play with right now. So we're we're kind of limited to like LLAMA Currently, the the four bit quantized version of it as well. So it's a little bit less accuracy.
00:29:19
Speaker
um It's definitely more of a challenge with these models because they're much smaller context window, um much smaller or sorry, not much smaller context window, much smaller parameter size.
00:29:32
Speaker
um In some cases, smaller context window as well. And just generally. what I perceive as lower intelligence behavior out of these models. um So it definitely can be ah more challenging. It feels a little bit more like working with chat GPT circa 2003. That's a long time ago.
00:29:53
Speaker
Yeah. All right. So AI world, that's a century, right? Yeah. And I, and I try to make sure i go back and like, when we say acronyms, DSL, you were talking about as a domain specific language, right? That's what you meant by when you say DSL, right?
00:30:06
Speaker
Yeah. Again, I think that was probably a poor way to characterize it. It's more like a subset of Python. Um, yeah, it's, and then, quantize that's an interesting, um, approach. So with these LLMs quantization is when you take like, so if normally the parameters or the, you know, all the numbers that are involved are, let's say,
00:30:28
Speaker
Two bit number, or I mean, two, two word or 16 bit or 32 bit or whatever. And you kind of represent them, you kind of squash them down so that they only take up four bits of memory or eight bits of memory. And and that's usually good enough and it gives you good enough behavior, but it doesn't take up this big amount of memory. Is that what you're talking about, right?
00:30:50
Speaker
Yeah, most models these days are um trained at 16-bit floating point, from what I understand. That seems to be what most them are actually released at anyway. i don't I don't know too much about the technical side of of training, so it could be... Maybe it's a little more nuanced. Don't quote me on that side.
00:31:07
Speaker
um But at at least typically, you would you would be consuming the 16-bit quantized version of the the model. um And that... that typically means it's really large. Like for the Llama 3.370B, I think it's it's something like 130 gigabytes, right? if you're um and And I'm sorry, that's actually just the 8-bit quantized version. So if you're doing 16-bit, it's a lot of memory.
00:31:33
Speaker
ah So dropping it down to 4-bit quantization, um you do lose some of the model accuracy. it's It's not exactly... um easy to understand the impacts of that, but I've seen some people quote that it's somewhere between two and 5%. I don't, I don't know how you even determine that.
00:31:53
Speaker
But you, you ultimately do lose a little bit of accuracy and performance there, but the, that does mean that you can run it on like a Mac book, right? You, you suddenly only need like, so I think it's 73 gigs to run the four bit quantized llama, llama 70 B at the three, three version.
00:32:11
Speaker
So it allows you to at least, you know, build prototypes and validate some ideas and things like that. We'll be moving to the the larger versions of the model in the near future as their lab brings on more CPU, or sorry, more GPU, um but As I'm sure everyone knows right now, it's a very constrained resource. It's hard to get your hands on it.
00:32:32
Speaker
um They're very ah power intensive. but The lab there had to had actually bring in more power for the servers initially to even get the the six GPUs that we have now. So it's ah it's it's pretty intense, the the amount of VRAM that these models consume, even even the open source and, and, you know what are relatively, yeah i heard of least the way I think of quantization, at least like the way always like to use analogies for, I can understand things. Like I think of like, if you have a 4k video, but you want to, you you know, you can watch it in you still, you still,
00:33:07
Speaker
know what's going on right it's just not as great of an experience it's not as crisp or whatever but you can still see oh there's a person or a cat or whatever right like you you kind of get the idea of the gist of it but that's kind of how I think of quantization is is just kind of down down sampling the the um the resolution on a video kind of the same idea that's how I think of it at least
00:33:29
Speaker
That's a really great way to look at it. But instead of the resolution of the pixels, you're you're looking at the resolution of the floating. And it's surprising that it only. yeah Yeah, exactly. you know i Going from like 16 bit down to 4. It's actually surprising that it's only that bad. That's that's a pretty cool.
00:33:45
Speaker
Awesome. Now, one thing I didn't i did want to ask about is on on these agentic workflows. How, how's your experience been with like error handling when, when things don't go as planned? How, how, how is it integrating with these LLMs when they're trying to plan things out and and they just run into a, you know, a hiccup, so to speak? How does that work?
00:34:08
Speaker
Yeah, it's ah it's a great question. So it really depends on the context. um So there's there's some workflows that we have that are are you know writing code and they actually take the code that they write and they run it in like a Python runtime and they they look at the results of that, right?
00:34:27
Speaker
And if it throws an error, then we actually just feed that back into the LLM and say, here's what you tried. Here's the error we got. Try again. And we iterate on that. So that's really cool. um In other cases, we we we have much more burring boring ah workflows that are just, you know, like making calls out to JIRA um and, you know, updating ticket status and things like that, right?
00:34:50
Speaker
ah And sometimes those... those endpoints are a little bit unreliable within the lab environment. There's a lot of proxies involved. There's a lot of but of interesting things going on and at this customer's lab environment. So sometimes there's intermittent failure on their self-hosted Jira instance.
00:35:06
Speaker
And so lots of lots of retries, exponential back off, that kind of stuff. um all All pretty standard things. I will say that that's That's one area where we're, we're hitting a little bit of a pain point with N8N because it does not offer a, the, um, exponential back off functionality.

Error Handling in AI Workflows

00:35:24
Speaker
You can have retries, you can set the interval. You can't have them back off if, if it's, you know, failing, uh, lots, yeah you know, you can't have the idea there would be like if it if they retry in a second. And then if that doesn't work, retry in two seconds, retrying for retrying eight, that's what you're talking about when you and talk about like kind of that exponential back off.
00:35:40
Speaker
Okay. Yeah, yeah. Yeah, exactly. Typically, the way I see it done is you you have your um your maximum retry count, which is, you know, we won't try more than that, no matter what.
00:35:54
Speaker
You have your retry interval, and then you have your exponent, which is like basically you multiply by the exponent every time you retry. And that that is your your new duration. um So yeah, N8N doesn't support that yet. I imagine it's probably coming. So that's <unk>s been a little bit of ah of a pain point for us. In the meantime, we just retry a lot because it's just the only only option that we have for those those third-party services that are unreliable.
00:36:20
Speaker
um This is also something that I think longer term temporal will really shine for for holding a lot of these workflows, especially since they're not really... um like Latency isn't a huge issue for us. In many cases, we have hours to process the the incoming um events that we're dealing with.
00:36:39
Speaker
In some cases, we even have days. so using something like temporal and and having its um durable execution runtime and and you know it has exponential back off and all that good stuff too built in.
00:36:52
Speaker
um Longer term, we kind of see moving to that as a solution for for some of these intermittent failures. um And then you know beyond that, we're still very much in a prototype phase, you know using using all no code tools, really trying to prove concepts.
00:37:08
Speaker
um None of this stuff is in production yet. We're still working with the the legal team at this particular company to get authorization to actually start using it and on production data and in in production use cases.
00:37:20
Speaker
So we we don't, we're we're very privileged in that we don't have to worry about that stuff too much. It's definitely something that's um ah very much a a concern on the horizon as we scale up and as we ah move into a more productionized kind of solution.
00:37:37
Speaker
But as of right now, that's that's really the only thing. the only thing we've done related to retries or ah it's related to. Yeah. And it's, I'm trying to kind of come up with like kind of a takeaway. It's interesting. I'm seeing like some parallels. So what we're the way we approach product and again, I'm not a product person myself, but you know, our group at, at Caliberty, we, we, when we approach product, one of the concepts as we're going through, you know, usability testing and those sorts of things is you you want to stay ugly, so to speak, as long as possible. Right. You know you don't want to be in there writing really intricate code, you know,
00:38:10
Speaker
and for prototypes. You stay ugly, low fidelity, all of those things as long as you can. you're kind of, it's kind of the same concept. You're using these, these low code, no code tools, not to say they're ugly, but it's, but it's less investment upfront as you're trying to prove out ideas and those sorts of things. And then as things start to take their shape, maybe you do like you upgrade to temporal instead of N8N or those sorts of things. I don't want to say upgrade, that's probably a dismissive way to talk about N8N. But, but that's the idea is you, is you invest a little more time into, into higher and higher fidelity, so to speak tools, uh, as you, as you get along with your idea and, and prove it out a little further. Is it, would you say that's kind of fair characterization?
00:38:52
Speaker
Yeah. And I think that ah aligns really well with a, I'm sure you've heard the the term, uh, make it work, make it right, make it fast. Right. Like, yeah, that's, I, I love that. The first time I heard that was like eight years ago from another Beethoven when I was in training and it really resonated with me.
00:39:09
Speaker
Um, i I like to almost always default to taking that approach. We have some clients who don't like that approach. And so it's, you know, there's been times I've had to adjust, but I like to push people in that direction.
00:39:23
Speaker
I think that first and foremost, getting something out there and working that people can put their hands on is like the the most valuable thing you can do. Even if it's half broken, even if it's, you know, it's not perfect, it doesn't look pretty, whatever it is, it has limitations. Like,
00:39:38
Speaker
Prove the concept as early as you can start getting feedback as early as you can start learning about how how this thing yeah absolutely should and should not be designed ah ah really quickly. Right.
00:39:51
Speaker
um And then. Once you have that, then you make it right. you You improve upon that and and you fix the things that are wrong with it. And you make sure that it's it's built in a way that is reliable and maintainable and robust and and future proof.
00:40:08
Speaker
And then if and only if you have performance concerns, you may because you don't want to really optimize performance. If you have some kind of a constraint related to performance, then you make it fast.
00:40:21
Speaker
And so I think N&N falls really nicely into that first category of make it work. Just get something out there, understand how the pieces fit together, understand how your data flows, because you can spend all day planning and writing up documents and and building diagrams. But until you see the data flowing through your system and you see the result that you want to see,
00:40:44
Speaker
you won't for sure know that you're not missing something. I had a professor who had the same concept in college. And I remember to this day, he talked about the three R's of software development or software engineering.
00:40:55
Speaker
Make it run, make it right, make it rip. That was the way he phrased it had the three R's of make it run, make it right, make remember it rip. I don't even remember who the professor was, but I do remember that lesson, the three R's.
00:41:07
Speaker
That's pretty good. Good way to remember, good mnemonic, so to

Kubernetes as a DevOps Tool

00:41:11
Speaker
speak. Our next segment is what we call the ship it or skip it. Ship or skip, ship or skip. Everybody, we got to tell us if you ship or skip.
00:41:22
Speaker
How about Kubernetes? Is this a DevOps power tool or an over-engineered headache? Ship it or skip it?
00:41:33
Speaker
ah Ship it. I'm a really big fan of Kubernetes personally. i i think that it's it's dangerous to look at it as the de facto option for deploying things.
00:41:45
Speaker
think there's a lot of value and in other tools, like just throwing something up in an EC2 container in Docker Compose. Yeah. That's great for some things, right? And it's fast. ah There's all these new fancy container-first platforms like Fly.io, and they're like, give us your container and we'll run it for you. It's all magic.
00:42:04
Speaker
I think that's great in some ways. um I think that there's a lot of downsides to those approaches. Most of them are around scaling.
00:42:17
Speaker
I have a customer who who operated on Fly.io and they they ended up reaching the point where they They had something like 7,000 concurrent VMs in their production environment.
00:42:29
Speaker
And like FlyIO just could not keep up with that. there Their dashboard would break. Their CLI tools would just time out when trying to scale things up and down. um It was a mess.
00:42:40
Speaker
And ultimately, the the right solution for them was to move to Kubernetes. But there's a really, really big, um theres there's there's like an ocean of difference between like a container running in Docker Compose on an EC2 instance and someone running 7,000 VMs on fly.
00:43:00
Speaker
And ah Kubernetes is somewhere in between. And it's probably a little different for everyone. Um, so I, I could, kind I could probably talk about Kubernetes all day here. Uh, I have a Kubernetes cluster in the closet of my garage actually made a test of raspberry pies. Um, so yeah, yeah. Long story short, Kubernetes is great.
00:43:19
Speaker
Don't be afraid of it. It's not as, it's not as hard as it seems at the surface level. Um, but also don't be dogmatic about it and don't, um, don't assume that it's always the right thing for, for your solution.
00:43:31
Speaker
And it's, as tempting as it may be to start out on Kubernetes for the future proof. This and the scalability, uh, maybe, maybe just don't, don't do that. Start it, start out with something wasted. Yeah.
00:43:45
Speaker
yeah I'm, I'm with you on the Kubernetes. Like I'm, I'm a ship it. I do think it is a power tool for, for DevOps front. and And the main reason I like Kubernetes is it's kind of like, it's a, it levels the playing field. Everybody knows there's kind of the standard, right? So you have,
00:44:00
Speaker
frameworks and utilities that kind of know this world and you can leverage those and piece these things together. So I do like Kubernetes for that. Okay. Any, um any final thoughts or advice that you would want to share with the, with the audience?

Embracing AI and Concluding Thoughts

00:44:13
Speaker
would just say ah this kind of falls into the advice category. hey i think um like I see and interact with a lot of people who are very,
00:44:28
Speaker
skeptical of AI. And I think that's to an extent that's like rightfully so you should be skeptical of AI and especially skeptical of of like the advancements that people are promising.
00:44:39
Speaker
But I think that you should also just lean into it and embrace it to a certain extent. um Even if it doesn't feel like the, you know, the the most natural thing to do, or even if you you have a lot of suspicions about it no matter how you look at AI, there is some value to be derived from it.
00:45:03
Speaker
And there are things to learn from it. And there are ways that that you can use it in a positive way. There's also definitely downsides and there's things that it's not good at. And there's ways that you can use it for for bad things, for sure.
00:45:16
Speaker
um But I think that's true of any tool. And I think people get caught up a little too much on the the philosophical questions around it. And at the end of the day, we're not going to be able to fight it. i mean, we're we're looking at basically the next industrial revolution, right? Like we anyone who is fighting AI now is the is going to to be seen in in history the same as the people who were fighting the machines in the industrial revolution, right?
00:45:44
Speaker
That is what we're looking at fundamentally. So you can fight it and and try and push it off. But I believe at this point it's inevitable. And if you embrace it and lean into it, I think that you'll be uniquely positioned to not only be able to to use this stuff better and more competently in the long run, but you may also find yourself in a situation where you you are able to help define the future of how human and AI machines and systems interact.
00:46:17
Speaker
And I think that that's very important. Yeah, i think John Henry was a great story, you know, but we we ended up with steam engines and and and he died in that story. So, you know, it's going to happen. And I think we could probably help ourselves. with the The creepy thing that I think that people get creeped out by AI is we spend so much time trying to make it human, where I think we probably would do ourselves s ourselves a favor by like leaning in as you're saying where ai can do things that we can't like don't try to make it human try to make it do the things that we can't do you know our perception is limited by what we can see and hear and taste and all that it's not limited by those things at least not directly um so but let's let ai really take over where where we can't we would fall short and we can be complementary of one another
00:47:03
Speaker
This has been great. It was a great conversation. Happy to have you on anytime. This is fantastic. ah Thanks to the the staff and all the folks who make this podcast possible. Thanks for tuning in to the forward slash podcast where bold ideas shape what's next in IT. If you enjoyed today's conversation, subscribe, share, and above all, stay curious.
00:47:22
Speaker
Until next time.