Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
#16 - Alexander Acker - What It Really Takes to Train AI and How to Make It Accessible to Many image

#16 - Alexander Acker - What It Really Takes to Train AI and How to Make It Accessible to Many

E16 · Adjmal Sarwary Podcast
Avatar
59 Plays1 month ago

How did we get from the first AI models to today's cutting-edge language models, and what’s next for AI infrastructure?

In this episode, I sit down with Alex to explore the history of AI, from early perceptrons to GPT-4, and the often overlooked hardware and system engineering challenges behind modern AI.

We discuss why building and training AI models today is far more than just code and data, it’s about scaling infrastructure, managing distributed GPUs, and creating robust pipelines for fine-tuning and domain-specific AI.

Key highlights include:

  • A deep dive into the evolution of neural networks, from perceptrons and backpropagation to transformers and LLMs.
  • Why NVIDIA’s early involvement with AI hardware became a game-changer—and how GPUs moved from gaming to powering AI breakthroughs.
  • The real cost and complexity behind training large models—and why fine-tuning, LoRA, and model distillation are critical for smaller players in AI.
  • A behind-the-scenes look at AI infrastructure challenges: what it takes to train, distribute, and deploy models at scale.
  • How Alex’s project Exalsius is building a platform to give AI startups and researchers better, more affordable access to distributed GPU compute, helping democratize AI development.

If you’re curious about where AI is headed, and what it really takes to build models beyond the hype, this is a conversation you don’t want to miss!

Recommended
Transcript

Introduction to AI and Guest

00:00:00
Speaker
Hey, what's up everyone? This is Ajmal Savari and welcome to another podcast episode. Let me ask you something. How do we actually build AI systems that don't just sound smart, but are grounded in real understanding? And what does it take to get there?
00:00:14
Speaker
In this episode, I'm joined by Dr. Alexander Acker to unpack the hidden layers of AI development. From how we got from the first neural networks to today's massive large language models, to why AI training isn't just a matter of hitting run,
00:00:28
Speaker
but a complex dance of infrastructure, data and experimentation. We also dive into why smaller companies are locked out of AI training and what it would take to change that. It's a deep technical conversation, but one that pulls back the curtain on what's really driving the AI revolution.

AI Development Journey

00:00:46
Speaker
Enjoy.
00:01:02
Speaker
Hey everyone, and welcome to another podcast episode. If you're new here, my name's Ajmal. I'm a neuroscientist and entrepreneur. On this podcast, we explore the links between science, technology, business, and the impact they have on all of us.
00:01:17
Speaker
Today we talk to Dr. Alexander Acker. Alexander is the founder and CEO of LogSite AI, where he leads a team that is dedicated to helping companies efficiently manage their increasingly complex AI systems.
00:01:29
Speaker
He holds a PhD in distributed systems and AI from TU Berlin and authored over 30 scientific publications. Alexander has collaborated with companies like Siemens, Deutsche Telekom and Huawei to optimize the performance of large scale data centers, backbone networks and AI applications.
00:01:47
Speaker
He focuses on bridging the gap between research and practical implementation with a strong interest in deep tech business development to drive the broader adoption of innovative technologies. His passion lies in unlocking the potential of efficient system operations for complex systems and large-scale AI models, creating value horizontally across industries.
00:02:08
Speaker
All right, enough background. Let's get into it, shall we? All right. So, hey, Alex, it's great to have you. um We met at the Applied Data Summit held by Deconium and our friend Arash introduced us back then. i remember we were in the first row and you sat next to me and I was really excited to, well, hear what you're going to talk about.
00:02:29
Speaker
So back then, your presentation was actually about um optimizing AI operations, balancing cost, efficiency and sustainability.
00:02:40
Speaker
And, you know, i think this is becoming an increasingly important topic, which needs to be put in the right perspective and reference, because that's most often missing, um especially when it comes to the energy consumption of training these models.
00:02:54
Speaker
And this reference point needs to be, well, what's the energy consumption in reference to what else? You know, otherwise you just see these big numbers and People don't know what that really means.
00:03:07
Speaker
And especially now, um it shouldn't be overlooked with all the improvements in efficiency that are actually taking place. you know, you always hear this is the energy consumption that takes place, but not we made these big improvements in the energy consumption overall.
00:03:24
Speaker
So um the costs of these models are plummeting further and further. It becomes cheaper to make them. You know on the one hand, this is really exciting to me for sure. And on the other hand, it's a little bit scary too, because, well, what is this all gonna mean for us?

Historical Overview of AI

00:03:40
Speaker
But for today, let's do a deep dive into the specifics of model training to get an overview of what is actually going on. um Too much is happening at once, and I hate the noise, and it's it's it's annoying. You can't really cut through it anymore. And so I'm so excited to talk to you about the essential aspects.
00:04:00
Speaker
And I think the listeners will also appreciate your insights for sure. But before I keep rambling on, all this buildup will get us to the actual main topic today. And it's about a project that you've been ah working on and about to start.
00:04:14
Speaker
So let's begin here. Tell us a bit broadly about this project. And after that, we can dive into the nitty gritty. So let's it's put in the proper place and people understand why it is so cool and important i give the word to you yeah of course thank you very much for having me i was looking forward to talk to you about this topic um yeah i mean you you said it in the introduction quite well i think things need to be put into a proper perspective right yeah and i think to put things into perspective i
00:04:48
Speaker
i I have a research background like you, right? And um i really like to take a look at the history of things because this provides... Usually it provides a very good perspective on understanding the things today and why they are happening the way they are happening.
00:05:06
Speaker
um So let's if if you don't mind, let let me quickly summarize ah or give a brief history of, let's say, AI. Yeah, please. um And then we i think it might help to align the puzzle pieces and to understand why things are connected the way they are today.
00:05:25
Speaker
um So, I mean, back then... All started, I think, in the area that is actually of your expertise, right? Right. the yeah but Back then, I think it was the late eighty hundred something, right, when people started to learn about the thing called neuron and how these kind of entities, the neurons, were connected to each other. So you have a network of them and you have some kind of an input signal. It's traveling through the network and whenever
00:05:57
Speaker
and a neuron receives a certain amount of signal, it fires itself and activates or contributes to the activation of the of its neighboring neurons. And people started to understand that this is some kind of, or this is a form or method that allows you to encode information right so whenever you you can basically take a look at this activation patterns and you can um understand which kinds of um activities or thoughts people are having right when you when you're fearful your amygdala is activated and yeah things like that and um
00:06:37
Speaker
So after understanding this, I would say, very basic concept and how our brain worked, people started to think about replicating this method into software.
00:06:48
Speaker
So they started to think about how could we model this kind of thing and whether it's possible to... model it in a way that's basically replicating the human brain, which um was then, I mean, back then it was after the first PCs were invented, after the transistor was invented. And think it was 1958, 59, something like that, the Rosenbaum yeah came up with this perceptron concept, right? That's right. um And he ah or this perceptron concept was a very, very first...
00:07:22
Speaker
implementation of this idea but the test it was very small, it had only one layer and the neurons back then, the perceptrons were connected in a linear way so you were able to um draw linear um linear decision boundaries in your search space.
00:07:39
Speaker
For example, it was possible to solve, let's say, linear functions over certain iterations of examples and then it was possible to see that the actual interconnected one layer of perceptron was converging to something, yeah but it was only converging to ah linear representation of things. So, for example, you were able to model the OR function, but not the XOR function, because there you need non-linearity in it.
00:08:06
Speaker
um And this was, or this um started to be a very exciting idea because people, I think, grabbed the concept of the ability to model the human brain in some form of perceptron, but it was still missing very important piece there, which was the ability to train these things on nonlinear decision surfaces, right? So for example, when I want to train my perceptron over multiple layers and to detect images, for example, numbers in in images, then I need something
00:08:49
Speaker
more sophisticated yeah something that is non-linear and which was then um or it took some time i think the first so the the time afterwards it was then the time of lacune and joffrey hinton you know these these kind of people that are well known today joffrey hinton won a nobel prize for it of in i mean they they were the godfathers of ai yeah yeah um they basically started there or they their work was based on something which is called backpropagation.
00:09:22
Speaker
Backpropagation is basically the way of um training neural networks to give them the ability to learn new things, to learn to detect numbers and images or to ah distinguish cats from dogs and images. yeah And ah so there, this backpropagation, it took some time to to be invented. So between the perceptron, as we said, was something around 1960 or something, the backpropagation really took off 1980, in the year of around that.
00:09:56
Speaker
um The basic... so This is one thing that i that I think it's important to mention. The basic mathematical foundation was actually invented by a guy called Seppol Einama. And he was a master's student in a Finnish university.
00:10:10
Speaker
Oh, wow. didn't know that. this ah this This was the basic mathematical foundation for the backpropagation algorithm. However... he he didn't apply it to neural networks. It was a different problem.
00:10:22
Speaker
And therefore, I think in the current literature still, it's a little bit under underappreciated, his his contribution in that. Because his work was then later on picked up by Weibus. He was working back then in Harvard. His professor was Carl Deutsch, I think. Back then, it was this this area called Kubernetes. i I know. It was the thing before AI, right? People were dreaming about ah dreaming about technology that could think and self-replicate these kind of things. And all of the you know the umbrella term back then was Kubernetes. yeah but
00:11:01
Speaker
And Carl Deutsch was one of the leading researchers back then. And Weibus was his PhD student. And he actually applied um or yeah applied this back this this back propagation algorithm um on on on an early version of neural networks and his idea then was picked up by lacune by jeffrey hinton and they applied it like they went really deep into image ah yeah image digital image processing.
00:11:32
Speaker
So where they invented things like the CNN, like the convolutional neural networks, this kind of thing that is taking a look at the picture and tries to extract meaningful features from it, but later on help the neural network to classify or detect or objects in the picture or classify the pictures um in in in a certain way.
00:11:55
Speaker
So this was, we're now at 1980, right? But this was like and the next AI summer. So everyone was hyped back then. But um I think these guys back then, they approached a certain limit, right? They tried to stress the...
00:12:14
Speaker
the resources that they had at this point to the maximum yeah yeah so yeah the and however there were what was a lack of resources the lack of resources was basically compute and data so back then in 98 we didn't have the internet so yeah data was really really scarce and um i mean yeah when you when you take a look at the literature it's usually the case that yeah yeah people say that also the algorithms were not that sophisticated but I mean obviously they were not because there were no necessity to improve them right yeah so necessity is the mother of innovation and you you you invent something new if you need to and you couldn't right I mean letsa you had the limits of memory so even if you had all the data label data in the world there was no way you could actually
00:13:05
Speaker
put that into a computer. There's just no way. ah And same for the processor, right? You had the great ideas of how to maybe make the backpropagation faster, have algorithms or implementations that calculate partial derivatives much faster.
00:13:22
Speaker
But, well, if the processor isn't there, there's nothing you can do. that Remains theory. Exactly, exactly. So this is um is an interesting thing, which I think ah this was the main reason why we had this this next AI winter, right? So we didn't have compute, we didn't have the data. It took some time to fix this.
00:13:44
Speaker
um And then if you fast if we fast forward a little bit into the year of 2000, 2010, and if you if you remember that time, or at least I remember that time, it was the time where the internet started to take off, right? Yeah, yeah. So we had this kind of first versions of um of of social networks back then.
00:14:05
Speaker
we had this We had YouTube. ah We had ah or news articles were available over the internet. So we had a lot of... data was compounding over time. People started to generate these data sets in the internet without anyone you know forcing them to do They just did it because they want to share things. yeah And this was actually a really nice vehicle for a eye two collect the data sets that AI needs in order to to improve.
00:14:36
Speaker
And um I would say the next um point in the history of AI would be 2009, where this is the point where ah where ah Fei-Fei Li, she was working at at Stanford, she published the ImageNet dataset.
00:14:57
Speaker
The ImageNet dataset, just just briefly, it is like a huge dataset of images. it three Over three million images that are separated in 5,000 classes.
00:15:10
Speaker
And this is something that is really, really difficult or it was basically impossible for a computer algorithm to solve this task, right? To analyze 3 million images and to classify them into 5,000 different different ah classes automatically without human intervention. And...
00:15:30
Speaker
and um It was interesting to to see people tried it, obviously. right um It was interesting to see that ah all the image processing algorithms back then they didn't really work out. And um there was one group working also on this on this project, which was the group of Jeffrey Hinton. right And ah in his group was actually Ilya Sutskever. I think a lot of people know him from...
00:15:57
Speaker
OpenAI and ah Alex Krzyzewski. And both of ah three the three of them, basically, they implemented a neural network that actually was applied on the ImageNet dataset and solved it on the human um capability level.
00:16:14
Speaker
And this was like an earthquake back then in the AI community. And there's one thing that is really interesting for me to see there, because, you know, um it And it it didn't I didn't ah see too many people talking about it, but I find it especially interesting because the AlexNet, which was the name of the neural network that was solving this ImageNet, was the first neural network trained on GPUs.
00:16:43
Speaker
So before that, everything was done on CPU. yeah And this is so nice because the the guys back then, right? So Jeffrey Hinton, Ilya Suskever, Alex Krzyzewski, they sat down, they implemented the training algorithm The CNN training algorithm, it was an, I think, 80 million ah or 60 million, something like that um parameter model with eight layers. So it's rather small for today, yeah but back then it was huge. yeahs Yeah, it's huge.
00:17:12
Speaker
And they sat down, implemented this, and they tried to train it on CPU. And they estimated the runtime to be several months. So, you know, they sat down, they implemented this thing and they were sure that it will work.
00:17:27
Speaker
However, it would take months to train. And they basically, they more or less, they started to drop the idea. idea But then Jeffrey Hinton, he met with... Jensen Huang.
00:17:39
Speaker
Oh, really? Yes. And they started to discuss because Jensen was... Geoffrey Hinton described to him the basic idea and also the problem, right? The limitation.
00:17:50
Speaker
And back then, GPUs were only used for rendering. yeah it was just it It was just a gaming thing, right? yeah So you render images with GPUs because rendering is highly parallelizable workload.
00:18:01
Speaker
You can just distribute on the GPU and it will just crush it. Yeah. and Nobody thought about applying this kind of parallelism that is possible with GPUs on the training of AI models before them.
00:18:19
Speaker
They sat down, they discussed it, and then actually the NVIDIA engineers helped them to implement this training workflow on the gpus and it was trained the first neural network that was trained on two gtx 580 gpus back then and it reduced the time from months to days wow wow i didn't know that uh just so everyone knows jensen is the ceo of nvidia so that's that Okay, wow. So now this kind of brings us full circle, looking at how things are today.
00:18:56
Speaker
And this now that you say that the the engineers helped them to make this happen, this also ah explains why NVIDIA um started so early on to make the software, to make this this translation layer in between. this Everybody knows this now as CUDA.
00:19:16
Speaker
That's, wow, I didn't know that. Yeah, it's amazing. It was basically the the the the time where it was the birth of CUDA, more or

Advancements in Neural Networks

00:19:24
Speaker
less. Yeah. Wow. That's insane.
00:19:27
Speaker
Yeah, I get really excited when I think about this because, you know, all the dots, they connect somehow and now it becomes more understandable, I would say, why NVIDIA is so strong currently in AI. Oh, for sure. Because they were, back then, they were supporting the first neural networks. Yeah, the very first.
00:19:48
Speaker
And it's only 15 years, right? Yes. but It's only, I mean, most people think that's an eternity. Well... It isn't so compared to the 30 years of AI winter before. That's, yeah.
00:20:04
Speaker
Wow. ah it It's amazing. I mean, it's amazing. And then basically the next steps were, right? So everyone was focusing on images back then, right? yeah everybody but Because everyone got excited. yeah um It was a breakthrough.
00:20:17
Speaker
um people it's it's so This research group, these these people showed that it's possible to train human level um capabilities into a neural network, at least for the image analysis domain. Right. That that was was amazing. A lot of researchers focused their their initiatives on on image processing. image analysis became let's for several years the main topic of ai back then yeah but there's another thing so i mean when i think about it it's a little bit like it's natural ah we are as humans we are visual creatures right so we perceive most of the stuff around us via our eyes yeah and this is
00:21:04
Speaker
It seems natural for humans, like for visual creatures to focus on something, to replicate it into some artificial setup, right? Yeah. so this was I think it's it's quite interesting, but there's another thing that is important in in in the the world around us, right?
00:21:21
Speaker
One thing is basically images. So we would like to process images with AI, but the other thing is that we would... need to process sequences because all around us sequences are all around us. yeah yeah We have sequences like videos are sequences of images. right um We have language, which is sequence of words or words are sequences of characters.
00:21:45
Speaker
We have also time series, for example, when we do measurements, which are sequences of of sensor signals. um So the other branch that was... not that far developed back then at that time, but it got more attention, attention it is the modeling of sequences.
00:22:05
Speaker
And there, it's quite interesting, Geoffrey Hinton, in his early days, back then in 1980, 1990, he was already researching recurrent neural networks. That were the first um first first ah entry point into modeling sequences.
00:22:19
Speaker
And then... Based on that, um things started to move in that area after the invention of the LSTM, which was, by the way, invented by Sepp Hochreiter and ah the other guy, also from Germany, right? So they're German researchers.
00:22:37
Speaker
yeah um And they were basically able to set up an architecture that was
00:22:48
Speaker
encoding sequential information from the past inside of a hidden representation of the neural network and use this as context for predicting the next sequence element.
00:23:01
Speaker
Yeah, yeah. And it worked quite well. So compared to vanilla recurrent neural network, the LSTMs could model sequences of several hundreds so it hundred elements.
00:23:12
Speaker
Like that was... for at least at that time, considered like long sequences. However, it was not enough to um to to model language because in language we have multiple thousands tokens.
00:23:25
Speaker
And there, LSTMs, they they basically suffer from this kind of um this kind of
00:23:34
Speaker
sequence length limitation where they are where they tend to forget things that are too far in the past yeah yeah in the sequence and therefore they are not if if something let's say happened in the beginning of the text they are not able to recognize it in and and and use this information and predicting a good answer for the overall um text that they are that you input in this in these models and
00:24:02
Speaker
Therefore, LSTMs were nice, but they're not the solution for for sequences. And then we basically moved towards the the the area of of ah the attention, right? Yeah.
00:24:14
Speaker
Where you have the transformer model. I think this was the next breakthrough moment after the AlexNet. We have this release of the transformer model body by the Google researchers.
00:24:25
Speaker
And by the way, Again, the foundation of the attention mechanism that is ah very integral part of the transformer was invented by a researcher who is called Dimitris
00:24:45
Speaker
Bahendau, I think. He was a researcher in the University of Bremen. Really? Yeah. He laid the mathematical foundations for the attention mechanism, which was then later picked up by the NLP group of Stanford. The Stanford NLP group was the leading group in natural language processing. They basically used it for mainly for translation back then. yeah And most of the talent from this group was hired by Google.
00:25:15
Speaker
So they basically worked on something that is very close to the attention mechanism of the transformer. And it was then, let's say, finalized at the Google research. So it's not, it's not yeah you know, I just say it say this because it's not like the transformer magically popped up at the Google research, but there's ah there's there's an history of...
00:25:41
Speaker
academic research behind it that led to this innovation at Google. However, the transformer model, that basically changed everything. Because the transformer has two advantages.
00:25:51
Speaker
It can process input completely in parallel, and the model can basically decide at which elements of the sequence it can put more attention to because they might be more important for predicting the future.
00:26:06
Speaker
yeah And during training, the model is basically learning on where to put the attention, right? So when we do... very simple sentiment analysis. So we probably need to put a lot of attention of things like bad or good, smiling, crying, these kind of things. Yeah. So they they they have a lot of information for predicting the sentiment of a text.
00:26:31
Speaker
um And the same goes for they for today's LLMs when we chat with them. However, it's more subtle, right? So it's it's a little bit more complicated there because the statistical distribution of language is very complex and long tail, which requires this model to be trained on a lot of data.
00:26:50
Speaker
But the thing is... Before the transformer, it wasn't possible to train huge models on huge amounts of text data. With the transformer, it became possible.
00:27:03
Speaker
Yeah. And it became possible due to two reasons. One, of the engineering part, so this was the bottleneck of parallelization of the input, parallel processing of the input was lifted. The transformer can fully parallelize the computation of the input data.
00:27:21
Speaker
And the second part is the attention mechanism, which allows the model to decide where to put the attention at the input in order to predict the next token or the next sequence element.
00:27:35
Speaker
It gives some kind of freedom to the model, which is also a little bit related to the bitter lesson of of Richard Sutton, but which was invented later on, but never mind. The thing is,
00:27:47
Speaker
The transformer was great. allow It allowed training at scale and Massively. massive Massively. yeah Like, really, trillions of tokens were possible. They didn't train it back then on trillions of tokens, obviously, but it was possible.
00:28:04
Speaker
It became possible. And the other interesting part now is that people that previously worked at... or important people, researchers that previously worked at Google Research,
00:28:18
Speaker
they started or they were part of the founding team of OpenAI. but So we have Ilya Sutskever, previously Google Research, now at OpenAI.
00:28:31
Speaker
and We have also and Andrew Karpathy. yeah um Dario Amodai was not ah it was not the co-founder, but one of the first reached researcher the researchers that was hired by OpenAI.
00:28:44
Speaker
um And these guys, basically, they took the research and took all the learnings that they had from Google and started their startup. And they started to release models, right? So there this is the point where we had this breakthrough moment of attention is all you need, the Transformer is there, now let's scale it to I don't know what.
00:29:02
Speaker
um And the first model that came out was huge. It was 170 million parameter, GPT-1. It was still a toy model. It was, yeah. And then next model, GPT-2 2019, 1.7 like factor of one dot seven billion like ah factor of ten x um it was better but still like a toy model and they decided to scale it even more right yeah they decided to scale it to the next level which was gpg3 which was 100 and i think 70 billion like another scale of yeah yeah and it was i mean to train this model the estimate goes between 4 to 12 million for a startup imagine this startup just put throwing 12 million on the problem and see what happens yeah yeah it's insane in my opinion yeah it's it's it's it's insane it's uh we will talk about this shortly um it's the definition of the yolo run it's hey you only live once let's see what's gonna happen and you'd only press enter
00:30:06
Speaker
and your eyes will be glued to the dashboard and you hope, ah yeah, you just hope it will converge and things will improve and that there are no massive spikes in between. Yeah, 12 million, that's that's a lot.
00:30:20
Speaker
I mean, now, when you that's what I meant with put it into perspective. We talk about perspective, about models now, it seems like a not so much money. But back then nobody really knew...
00:30:34
Speaker
Would this result in better performance? Would this make the thing better?

OpenAI and GPT Models

00:30:39
Speaker
ah So getting 12 million and just ah pressing enter and then hoping for the best. That's, yeah, that takes some guts, I would say. yeah that's That's for sure. I mean, i would I would really love to take a look at the heads of the people that decided to just do it, right? Yeah. because or but they they i I would imagine they had some internal signals that the probability of this to succeed is at least not zero. right Sure, sure. but's But still. But still, yeah, it is a huge risk in my opinion. So it...
00:31:19
Speaker
Could have gone wrong, but it went quite well. went quite well. It was a very nice moment. GPT-3 was still, I would say, ah nice, but not really useful for practical stuff. I think the first really useful, useful model was g p three but five yes But ah still, GPT-3, I remember when it came out and I played around with it, it felt like magic.
00:31:42
Speaker
but yeah i could I could just ask it, yeah write me a poem about this and that, and things would rhyme. It would sound good. I would say i would it would take me probably, I don't know, an hour to write that poem. The GPT model was able to do it just out of the box. It was it was amazing. yeah Yeah, yeah, yeah. No, I also remember um when I first...
00:32:03
Speaker
i didn't I think at first played with GPT-2 and I was was like, okay...
00:32:11
Speaker
nice okay under i because from my perspective the research was super intriguing and how they went about it and the result i was impressed i mean compared to anything i've seen before this was definitely better um and then when i played with three points so then there was not much that i had played with and when 3.5 came out and i played with it i thought holy shit and so This is okay.
00:32:40
Speaker
This is a bit scary. And finally, so we're sitting in the Alan Turing room. ah When it comes to the Turing test, when I was playing with this thing, i was thinking...
00:32:51
Speaker
okay, I know that this thing is a machine, but if I give this to anybody else, I don't think they would know. They would think maybe I'm playing, I'm talking to some someone that, yeah, maybe doesn't, is forgetting things here and there, but is otherwise pretty coherent.
00:33:17
Speaker
and that and And that I thought was insane. I agree. And we needed, I think that was the point when philosophy had to think of something else. Definitely. I mean, it is so interesting, i think, how humans adapt, right? Yeah.
00:33:33
Speaker
Today, it's fairly easy to spot if an answer is entirely generated by AI. Or it it became more obvious, in my opinion. yeah People started to you know pick up this kind of skill or on detecting this. yeah But back then, if you if you just put somebody, in my opinion, if you put like you said, if you put somebody from, let's say, 2005 into our time now, sitting or placing them in front of GPT-4, something like that,
00:34:01
Speaker
I don't know if they would be able to spot that this is an AI. No way. Probably no, right? No way. No way. They would just think, oh, some very sophisticated person that knows a lot of shit. Yeah.
00:34:15
Speaker
And can answer very, and can type very fast. yeah That's the thing. That's true. Yeah, you're right. Yeah. So now, so you talked about this entire breakthrough coming from all the way from starting with the first perceptrons, going to the first, you know, convolutional neural nets, and then AlexNet, which broke the next barrier in development.
00:34:38
Speaker
Then having some type of um sequencing in in the detection, in the in the um in the output of the um neural nets, and then going into the transformer model, which I might add Google was sitting on for quite a while and didn't do anything with it, which very much surprised me.
00:35:00
Speaker
that's not That's not very much like Google is normally. um And now we're in this, I don't even know what to call it, that that revolution and I guess race to the biggest model.
00:35:17
Speaker
That's at least how it seems at the moment. And of course, the transformer model has has found uses not only in natural language processing, right? That's also important to say. through in many, many different areas that, of course, to the normal person, it's not that they don't understand what's going on.
00:35:34
Speaker
And I'm sure we will talk about shortly as well. But there's not only natural language processing when it comes to what the transformer is being used for. And the other areas are also, I would say, just insane of what is going on.
00:35:50
Speaker
So now now you you told us a little bit about the history. how does that lead to the present? The present. what ah How does that lead to the present and about roughly what's going on at the moment and this project that you have in mind? Yeah, yeah, yeah. um I'll try try to lube it to to a a project, or or yeah of course. The thing is...
00:36:20
Speaker
And take your time, right? Yeah, I'm...
00:36:27
Speaker
I think one thing that is um probably also important to understand is that you see this breakthroughs on the AI level, right? So AI is basically math and software.
00:36:41
Speaker
Yeah. But then... the actual implementation of it and the the point where things let's say come to life and and you're able to use it this requires very sophisticated infrastructure it requires very sophisticated hardware i mean we we are I think Nvidia is doing a great job in innovating chips and hardware, the software stack as well for AI, but still usually hardware is a little bit slower than software, right? Software is always ahead of head off the game there a little bit.

Infrastructure Challenges in AI

00:37:21
Speaker
and I think what we saw during the past, don't know, five, six, seven, seven years, maybe, it's insane speed.
00:37:32
Speaker
So nobody was able to imagine the scale at which infrastructure has to be built, connected, and set up in order to support that kind of workload.
00:37:46
Speaker
And this and it if we take a look at, for example, the the chips themselves, right? The NVIDIA chips, you have mainly three dimension dimensions that you need to consider. One is the flops, right? So this is the speed of the chips, the amount of numbers crushing that they can do per time period. Yeah.
00:38:07
Speaker
All right. Then you have the memory. So how much information can we store very close to the processors in order for them to access this information quite quickly?
00:38:19
Speaker
Yeah. yeah And then we have the interconnection between the GPUs, which was, by the way, one very, very recent... in in a timescale of AI, very recent requirement that these chips have, because these models, they do not fit on a single GPU anymore. yeah You need to distribute them across many, many GPUs, and you need to interconnect this GPU somehow in a way that they're efficiently working together, and working, basically training the AI model, crunching the data set in order to making the AI converge to so to something. yeah
00:38:55
Speaker
um And i I think for me, it's fascinating because these kinds of chips, all right, i mean, flops, it's it's fine. You have a lot of parallel stuff going on on the chip.
00:39:08
Speaker
It's okay. Then you have the memory requirement, which became more and more obvious with AI that you need more memory and you need to address it and and access it very, very quickly, which was one, I would say, additional requirement that was very,
00:39:24
Speaker
strongly pushed by the AI training and the other one one is
00:39:31
Speaker
out of these three, it is the one that is mostly pushed by AI, this kind of interconnect of NVIDIA GPUs. And NVIDIA, basically, they had to reinvent hardware stack outside of the GPU in order to make it more efficient to run AI workload on them, right? so You have something like the the sockets, the SXM sockets, where you have the the socket is directly connected molded on the server mainboard, which allows the GPUs to bypass the main memory of the server and directly communicate with each other on a single server.
00:40:13
Speaker
And then you have something like the NVLink, which is a switch that is also molded on the on the main ah main board of the of of the server, which allows you to interconnect several servers and allow the gpus to directly communicate via this link which is completely controlled by nvidia over several servers and this is the kind of technology that you actually need today in order to train these large models in an efficient way yeah and i mean it makes in my opinion it makes total sense to to have this kind of setup in let's say central data centers to set up in central data centers because there it's
00:40:56
Speaker
you have more more control over the the over these things, right? So you can you can basically implement everything via NVLink whenever you have not or you're not able to connect it via NVLink. You have at least some high ah throughput cables between the servers, um which which you can use in order to transmit the data.
00:41:15
Speaker
And now...
00:41:19
Speaker
As long as, let's say, we have enough of these things available to everyone, we can just go with it. yeah But it's not the case. Because the the the case is that we had and shortage in availability regarding...
00:41:36
Speaker
these things. So it was really hard for some period of time, it was really hard to get access to GPUs, to enough GPUs, even for companies. yeah Because these companies were using them themselves themselves, or they were renting it to other big AI labs, which paid a lot of money yeah ah to to use them.
00:41:54
Speaker
Or people use it for Bitcoin mining. Yeah, that's true. That's true. That that came as well, right? as well That came as well. That was also a very important factor there back then.
00:42:05
Speaker
And i mean, in my opinion, this is the point where innovation doesn't stop there, right? So we don't stop to develop AI models just because we don't have access to enough compute. We try to figure out ways in order to...
00:42:21
Speaker
train our models even if we have less compute available or yeah even if the compute is set up in a way that is not optimal for training these models. And this is, I would say, one
00:42:35
Speaker
very prominent event that happened recently that showed actually that people are able to figure out things even if working under hard constraints is this kind of deep seek moment, right? Yeah, yeah. yeah And i mean, for in that this moment, I would say, is also something that... um that helped us a little bit. So just just briefly to ah to to to describe this DeepSeq moment, why it's so important.
00:43:04
Speaker
it's DeepSeq released a model and they released a paper describing very in much detail what kind of technology they use on very low-end chips that are available in China due to the restrictions, to shipment restrictions of GPU from America to China.
00:43:22
Speaker
And they were able to implement... a training algorithm for these models on their limited hardware. Yeah, yeah,
00:43:33
Speaker
And this model then, it was released on, I don't remember exactly the date, but it was a Thursday, I know, because it's I just checked the the the um the release. It was insane. I was really hyped by that by that by that paper.
00:43:48
Speaker
And then... Friday, right? Saturday, Sunday, stock markets are closed. And then on Monday, people started to pick up this model, right? The signal went through all the layers there and the Nvidia stock basically dropped by 17%, which was 600 billion of market cap, just like that, gone. I mean, the DeepSec moment, the technology is nice, right? But this is this has a really huge economic impact on our um on our economy. yeah Yeah, it does. And this is basically the why the DeepSec moment were so impactful and was perceived as important.
00:44:32
Speaker
And
00:44:35
Speaker
At the end, it was I must admit it was great for us because we are basically system engineers. So we are doing distributed systems, system engineering for AI models, for inference, for training.
00:44:49
Speaker
And this actually but brought the people or to to show people, let's say outside the inner circle, I would say, showed them the importance of system engineering for AI, which was great for us.
00:45:04
Speaker
And we were back then working on a project, but basically was still working on it, um which should...
00:45:12
Speaker
One high-level objective is basically to allow more people, more researchers, more AI labs to have access to compute infrastructure as they need it, right? Yeah. So it should be affordable, it should be accessible.
00:45:29
Speaker
And we try basically with our solution, we try to scan the whole GPU market, on-demand market for all the suppliers out there. It could be suppliers here locally, right, in Germany or in Europe. It could be the ah hyperscaler suppliers.
00:45:44
Speaker
All of them are offering GPU resources on-demand that you can rent and you run your models on it. And we try to scan the whole market and to take a look at what are the prices, currently out there, right, for different things, for spot instances, for reserved instances, for instances that are that you can just use on demand.
00:46:02
Speaker
um This is the first step. This next step would be to take an AI workload, which is, for example, training. So a researcher wants to train a model. And then our... um Our system will take care of the provisioning, the distribution, the handling of the data set and distribution of the data set on the available um on the available GPU nodes.
00:46:22
Speaker
It will also handle the robustness there because if you use spot instances, they can't just die. You need to shift around things there. um And we would like to enable this kind of parallel model training, which means that we're not only provision the ah the a certain...
00:46:42
Speaker
infrastructure at a certain supplier, at a certain data center, but we also try to distribute the workload across several data centers, but which gives us the opportunity to scale more. But also we need to take care of certain things there, right?
00:46:55
Speaker
um So this is basically the the the overall objective. Why we're doing it or why do I think that this is important? I think
00:47:09
Speaker
In my opinion, it's i can totally see a future where we will move towards more customization of AI. So it's not like we have this one or two or three chat models that we will use to solve certain things. But I so i think humans...
00:47:30
Speaker
they get used to certain things and then they want more. They want things that are more custom. They want things that are solving their issues. They want to, but for example, they they probably want an AI that is basically understanding their needs, understanding their way of talking, understanding the requirements that they have at the moment, right? This is for individual users, but also for industries, for also for companies.
00:47:54
Speaker
if i would like if If I'm using a product which is integrated well within AI, I would assume that this product um or the AI behind this product is tailored to the specifics of this product.
00:48:05
Speaker
um And if this kind of future, I'm pretty sure that we will have this kind of future. I just don't know when it will happen. ah It could be maybe 10 years, maybe 20 years, maybe five years, I'm

Democratizing AI Access

00:48:18
Speaker
not sure.
00:48:18
Speaker
But I think in this kind of a future, it would be nice to have, or it's very important in my opinion, to have a layer that provides you on-demand access to compute resources whenever your AI needs it.
00:48:35
Speaker
So it's not like you need to set up an account and hyperscaler, you need to set up your your infrastructure, you need to ah dive in into the specifics of a specific ah cloud computing provider. But no, like and you can imagine like like something like in GRID.
00:48:51
Speaker
A grid where you can just plug in your AI, it receives the compute that its needs that it needs in order to update itself, and then you're good to go.
00:49:03
Speaker
You just pay for that. and our And this kind of grid would be... but It would be possible for this grid to optimize for the current market supply.
00:49:14
Speaker
At the end, you have really broad range of suppliers for this AI compute market, and you try to optimize the availability, certain constraints, and also the price, obviously. And this is the kind of thing that we would um like to build, something like and like a platform that is abstracting um this.
00:49:32
Speaker
And at the end, the overall goal is to simplify the access for AI researchers, and AI companies, AI teams,
00:49:45
Speaker
to simplify the access for them to the infrastructure that they need in order to develop their AI.
00:49:54
Speaker
Okay, well, this is a lot. Give me gime a moment. um Yes, I mean, of course, you've already told me about this project. yeah um Is the name already public?
00:50:07
Speaker
ah Yeah, Excelsus. Excelsus, all right. And so you said... Yes, so that's a lot. Let me let me quickly ah process this myself because you haven't told it to me in that much detail.
00:50:19
Speaker
So I think what's important for everyone to understand, I mean, but everybody that works in this space like understands the pain they have to go through if they... even want to start doing something ah like model training.
00:50:34
Speaker
Everybody that is working in this space understands how painful it is currently. And um not only when it comes to, you know, you already have all, assuming you have all your data labeled, ready to go, and you can now train.
00:50:53
Speaker
It's unfortunately not like you press enter and then you wait and then it's like a microwave. Ding! It's finished. No, that's not unfortunately how it goes. um You have to do many, many, many, many little steps in between.
00:51:09
Speaker
And none of them can go wrong. If they do, you're fucked. And you piss away a lot of money. and Most of the time unwillingly. um And that happens all the time.
00:51:22
Speaker
And even um if that wasn't there, you have so many other things to worry about. So I think that's very important for people to understand is who don't work in the AI space is just if you have your data ready, it is not just hit enter and your model training will take care of itself. Unfortunately, that is not how it goes.
00:51:43
Speaker
And I often like to talk about this in layers of abstraction. Now you have, as you said at the beginning, right we had the CPU and then you had the instruction sets and then you had this another layer of abstraction. You had the CUDA and another layer of abstraction. Then it was integrated into easy accessible programming languages like Python. And then people didn't have to worry about the layer below anymore. So every, and let's say every professional or as expert working on the individual layers, worked on tweaking those layers, let's say to perfection, so that the other person that works on a layer above doesn't have to worry about it anymore.
00:52:25
Speaker
That's why most of us don't program an assembly anymore, right? That's, it varies to most people, the sounds that most people don't even know what that is, which is fine, right?
00:52:36
Speaker
Don't have to worry about it. And that's exactly the point. and we get to the next layer of abstraction. And in the next layer of abstraction, and I really like how you put it, if you want to train a model, if if you even want to try, in the current moment,
00:52:55
Speaker
you don't have much of a choice of how to go about it. you ah You can either use a cloud provider or you yeah you could maybe go to your own data center, rent some servers, or a if you want to be the manager,
00:53:16
Speaker
100 bootstrap person you well use your gpu at home and wait 50 years and and if you're lucky if you're lucky right and it's not gonna it's not gonna work anymore and um i think it's very important for people to understand that's why i really liked you you put it into perspective of how the development was and the insane scale we have reached.
00:53:42
Speaker
And think this is for people difficult to understand because yes, we have the flops of the GPU and we have all these parameters, you know, i'm i how how big are the latest AI models? What's it, 700 billion huge parameters?
00:54:00
Speaker
And what I think people need to understand is these 700 billion
00:54:06
Speaker
Billion parameters need to be adjusted over and over and over and over again until it reached until this model that has all these values reaches a specific level of performance.
00:54:23
Speaker
and there are different ways to go about it but the hardware has to be there to handle this now 700 billion parameters ah you don't just fit that like a three three parameter regression model it's like you can't that's not possible and you have to as you said you have to split it you have to divide it you have to orchestrate it and all that has to happen on hardware And what I also like about how you said is NVIDIA went and developed new ways to go about it, right? it's
00:54:59
Speaker
it It's like, I always like to give the the example of a hard drive. If you have a hard drive that is on your laptop, right? Your SSD or NVMe that's soldered onto the chip, that's the fastest it can go.
00:55:16
Speaker
or you have something external that you plug in, which is then maybe USB 3.0 or Thunderbolt or whatever it is. it it's It works, but it can never be as fast as the thing soldered onto the chip.
00:55:28
Speaker
And the same goes for what you just said. So we have the cloud providers, we have the data centers or the trusted GPU at home. and Each one comes with pros and cons of what you have to think about.
00:55:42
Speaker
And I think that's ah sometimes I find it a bit unfair when I hear these these conversations in the workplace or when I work with customers. Then they, let's say the the project manager, it's just saying, yeah, well,
00:55:57
Speaker
How difficult can that be? you know, I think like you have no idea what you just asked. thiss is this a we you You cannot expect small companies to have people um with the expertise in-house that are working at OpenAI.
00:56:17
Speaker
and they And at OpenAI, they basically don't do anything else except to orchestrate all of this. yeah so and this And this becomes then...
00:56:30
Speaker
m The other question of mine, in is it even possible? I mean, it's I really like ah how you talked about it. But at the current moment, is it even possible for, let's say, the normal person? And with a normal person, I don't mean a single person. I mean, just not me at home. yeah But even a small company to even compete anymore, is that if if I don't have...
00:56:57
Speaker
I don't know, a hundred million in serious A funding. mean, how would I even go about it? Because there's something like you just explained doesn't exist yet.
00:57:09
Speaker
And if I were to be a startup company asking for a hundred million, I, well, basically have to use a hyperscaler.

AI Training Phases

00:57:20
Speaker
And if I use a hyperscaler, well, they're not going to give me my model file, I guess.
00:57:25
Speaker
I'm tied. I'm tied to them. So how do you, how, and and maybe that will bring us now to the nitty gritty details, right? How is model training happening at the moment?
00:57:38
Speaker
yeah I mean ah there's a lot of things to to to take apart there definitely um how model training is um done at the moment I think it might make sense to to think about the steps that the model goes through when you train it right so you have basically you have two steps um and the second step can be split into two sub steps but let's start with these two steps one is the pre-training And the other one is the post-training phase. so So during... and i would Oh, yeah. Let me let me just add. yeah Let's assume we have all the data already ready to go. Let's assume let's exactly that. Because yeah you're totally right. So yeah this is basically one of the...
00:58:22
Speaker
huge biggest values that these big AI labs have right they have this curated clean training data sets which are really really hard to get and to get to a quality yeah which is allowing you to train the best model at least during pre-training yeah yeah and during pre-training I mean that's that's a very interesting thing because during pre-training the training itself is very fairly simple.
00:58:53
Speaker
you It's an autoregressive approach of based on the transformer, where the transformer tries to predict the next token. You give it a sentence and you the task of this model is to predict the next, let's say, word.
00:59:07
Speaker
And you do this millions and billions of time on trillions of tokens this is just crunching data sets and you try to compress the information that is out there in this text data let's focus on text data just for simplicity right into the weights of these models this is pre-training and pre-training you cannot do much on the training algorithm side, because it's just other aggressive prediction and it's days like this, you have a very simple loss function yeah and you try to optimize ah optimize on that.
00:59:45
Speaker
Basically during that phase, you can imagine that the model is learning two things. It learns how to talk. And it learns knowledge. So it compresses the knowledge that is encoded in the text and tries to transfer this into the internal representation inside its weights.
01:00:05
Speaker
By the way, one very interesting aspect there is that this kind of training, nobody's really able to explain how the model trains. So how does the model tree is is picking up the ability to write song texts? yeah Or to write poetry or to write things that it never saw in the training data. This is called something like emergent ah emergent abilities. yeah There's a paper around it which actually talks about emergent abilities of these LLMs.
01:00:35
Speaker
When you just scale the amount of data and the amount of weights that you train with this data to a very... huge scale then these models somehow magically pick up concepts that are somehow intrinsically encoded in the text but not explicitly yeah so and and so this is that feels like magic It does feel like magic. that it's That's insane.
01:01:02
Speaker
And then basically after the pre-training phase, during this initial phase, you have a very, hopefully you have a very capable model that is able to predict the next token.
01:01:14
Speaker
However, the model at that phase is not really useful. For example, if you would ask this model to say, what is a concrete method of post-training an LLM?
01:01:27
Speaker
It will output something like SFT. It's correct, right? yeah But it's not really useful. You would expect this model to explain a little bit, right? To put a little bit of context around this answer to to to give you dense, but a very digestible information in some kind of a text.
01:01:45
Speaker
And this is what happens in the first phase of the post-training. So you try to teach the model to no you You don't add additional knowledge to the model, but you teach the model on extracting the knowledge in a way that we humans perceive it as useful.
01:02:03
Speaker
And this is also where we add some kind of constraints to this model. So it shouldn't you know it shouldn't output things that are potentially dangerous. It shouldn't outpu you explain to you how, I don't know, building dangerous things or how to ah print weapons or whatever, right? so you try to to to teach the model to answer in an appropriate way, in a useful way.
01:02:27
Speaker
And this is done by a set of different um different methods there. I wouldn't dive into this because i myself, I'm not a 100% expert in post-training. um But on a conceptual level, the post training the first phase of post-training is exactly that.
01:02:41
Speaker
First of all, pre-training, teaching the model how to ah teaching the model how to talk. in general, the knowledge of language and the knowledge itself, the fact knowledge is encoded in the model. And the second part is how to access this knowledge for humans to be perceived as useful.
01:03:00
Speaker
And
01:03:03
Speaker
This is basically both of these training procedures. you can You can define them from from a higher level, training on examples. You basically show the model examples, you tell the model if it was right or wrong, when it was guessing, and then you're applying the backpropagation, you're adjusting the weights in a way that the probability is higher that the model, next time it sees this example, predicts it correctly. yeah So this is this is the overall concept of these two phases.
01:03:33
Speaker
That's why sometimes when you use ChatGPT, it gives you two two outputs and says, which one do you prefer, right? it's like Very good. yeah Yeah, a very good example. This basically um this is a construct contrastive learning approach yeah where it's basically outputting two separate separate ways on answering a question.
01:03:53
Speaker
And it tries to, or you you teach the model, which one is preferred by humans more. Yeah, yeah, yeah. That's perfect. Exactly. This is post-training. So you train it to to apply to your preferences.
01:04:06
Speaker
And the next thing, and which is more more recently, is the chain of thought thing, right? So there, this is reinforcement learning. Reinforcement learning is a little bit different from the other parts. There, you don't teach the model by example, but you let the model...
01:04:21
Speaker
Learn by self-play. but This is also like a concept that is very prominent and in in with with ah with children, right? yeah So children try out things all the time and then they fall, then something bad happens, but something sometimes something good happens and they learn how to interact with their...
01:04:38
Speaker
environment. And this is what happens in their in the post-training post training and this reinforcement learning thing. And this reinforcement learning can be applied on tasks that are verifiable.
01:04:50
Speaker
So basically you give a task to a model and you know the correct answer, but you let the model figure out a way on achieving the correct result.
01:05:01
Speaker
And all the steps that it did in between, you basically try to provide, if if it ended up in the correct on the correct answer, which is possibly, for example, for math or also for coding, you can verify if a code runs correctly or not. yeah um If it arrived the correct answer, you reward the model and you reward also the steps the model took in order to arrive at that answer.
01:05:26
Speaker
And if it's wrong, you basically punish the AI model, but you basically reduce the reward. You give negative reward to the model, which allows the model to then um decide that this chain of thought that is applied to achieve this result is not optimal.
01:05:45
Speaker
And this is basically the three steps on how AI model is trained today. The first step, the pre-training. This is where you need a huge, huge, like really huge amount of compute.
01:05:59
Speaker
This is like numbers crushing. This is the thing that you need to run for for days, sometimes weeks on on thousands and ten thousands of GPUs. in order to get a base model, the model after the pre-training is called the base model, um that is useful and that can be applied to certain other domains in an effective way. yeah And this to, let's say, to to train base models at scale of hundreds of billions parameters, this isn't feasible for small companies at the moment.
01:06:36
Speaker
That's for sure. However, you have very nice techniques where you, for example, can use these base models. You can post-train them and you can also apply something like a distillation where you try to extract the knowledge from the big model to a much smaller model on a very certain and narrow domain.
01:06:54
Speaker
And these kind of things, they are done by startups and also by smaller companies, by industrial companies. I would say by companies that are not the big AI labs at the moment.
01:07:09
Speaker
And I think this is the point where we see the opportunity for a lot of companies to to strive, right? to have To take the space models and to distill or to fine tune them to their own requirements.
01:07:26
Speaker
However, when I, for example, talk with the startups that we're in touch with or with with our customers, then So the main pain point that i so that i hear, and this is exactly complies with what you said, yeah is it's just too complex at the moment to set up the training pipeline, to set up the lifecycle management of these models, to take care of the data distribution, to take care of optimizing the the data distribution.
01:07:56
Speaker
to take a look and estimate what is the return of investment of a very specialized model compared to a model that we just take out of the box that is already out there. yeah Which is the main reason why many, many companies at the moment, startups especially, are using these APIs, right? So ja they just they they know these models are not perfect for the use case.
01:08:18
Speaker
It's like eighty twenty right? yeah It's 80% there, but it would be nice to have... few percent more for my use case to be more useful useful to their specific customers, to their specific user groups, but they don't go this way.
01:08:33
Speaker
due to the complexity that it takes to set up this infrastructure for training, to a dis to to manage your distributed training runs, to manage your model life cycle, to maintaining ah your data sets in combination with the model disk, all of these things, they're just hard and difficult to do. Very difficult.
01:08:52
Speaker
And yeah, a hopefully we try to solve exactly that pain point. Yeah, yeah, for sure, because... What I've seen, what's currently taking place, even if, ah let's say, general purpose model is is going to get you there maybe just 70% of the way, not even 80, which is still a lot, 70 out of the box.
01:09:15
Speaker
I've seen people just build systems around it. So they have they had just have, you know, one one prompt is doing one thing, the other prompt is checking the output of the other, and you just get a system. And...
01:09:26
Speaker
then you might be able to end and quote unquote engineer your way up to let's say maybe 85%. But that's radically cheaper and faster than to train something up.
01:09:40
Speaker
But it means you're dependent. you so You stay dependent. um And I think that's a very crucial aspect here, right? It's ah sure you can um try to train something up, but...
01:09:56
Speaker
The speed, I think that's what at the beginning what you talked about, the speed of development is so insane that in order to catch up, you need a lot of money. yeah And it's not necessarily um for getting money.
01:10:14
Speaker
getting the super smart people in because let's say that the cat's out of the box. that we We know roughly how to do it, but it is exactly what you just said. It's set up the infrastructure um and not only the infrastructure to train the models, but the entire infrastructure on top to be able to monitor, yeah handle all the data, the process the data, have it correctly labeled, and feed that back into the model training system. and It's not just, yeah, well, I mean, it would be nice if you just hit enter and then the thing is trained and then you get your file and it's done.
01:10:56
Speaker
No. And now also with the, let's say, the the regulations that came in, Man alive. it's it's it's It's a never ending story of the complexity behind of of all all of these things you want to do and as a

Fine-Tuning AI Models

01:11:13
Speaker
startup. right So I think that's definitely interesting um that you already said, okay, we have the pre-training, we have the post-training, which to me makes it already quite interesting because as a startup,
01:11:28
Speaker
if I were to start a small company, I also try to understand at the beginning, because that's what I meant with the noise ah that's going on out there, so many different terms are used and I'm just gonna probe your brain a little bit just for my own curiosity.
01:11:47
Speaker
So when these big companies talk about fine tuning models, what are they exactly talking about? We're not talking about the pre-training, I guess, because that would just be too expensive to do, right? They have their base model, the curated perfect clean slate that already knows roughly how to talk, not in a useful way, then in a half useful way, I guess,
01:12:12
Speaker
And then comes the fine tuning or how, let's say we would use, let's say I go to open AI and I want to use, what is it now? GPT 4.1 or five. i don't, I don't even know. um Can I, can I fine tune that?
01:12:34
Speaker
And how is the fine tuning go? Is it just like a system trick where they maybe put a,
01:12:42
Speaker
a base prompt before I actually send my prompt, but I actually don't see it and it's already going to go a bit more into the satisfactory way that I would like? Or do they add a potential other training layer on top?
01:12:57
Speaker
I'm sure there are many ways of how you can go about it. I'm just trying to understand what do they mean with fine-tuning? Yeah, it a very good question. I mean, the example of OpenAI, it's very difficult to answer because they are not disclosing it, right? Right. So this is, they're doing it in a closed way. But we can take a look, for example, in the open source community that is working with the Lama models, for example. Yeah, let's do that. Yeah, this is, so basically the fine tuning, it depends a little bit on the use case, right?
01:13:27
Speaker
So, for example, you might have a use case where you want this ah your model to answer in a very certain way, which is perceived as useful by a very specific role in your company.
01:13:38
Speaker
For example, you take your accountants, right? Your accountants, they expect your model to ah use a certain language, yeah use certain terms, and be maybe very ah formal in answering the requests.
01:13:52
Speaker
So you apply something like supervised fine-tuning or... um or DPO LHF. So these kind of things that tweak the model itself in a way that it answers.
01:14:06
Speaker
So it has the knowledge, it collected the knowledge during pre-training already, and it answers in a way that is perceived as more useful
01:14:17
Speaker
by the roles that will be the end users of the models. So for example, your accountants and in the company. yeah This is one way of of doing it. So basically this fine tuning refers then to the pro so to certain methods in the post-training.
01:14:32
Speaker
I see. The other way is, for example, if you have, let's assume that you have a very, very specialized domain. Let's assume you are a company that is building...
01:14:44
Speaker
turbines, right? Like very complex stuff. It's not available in the internet. So no data about this kind of how they work, right? yeah However, you would like to build an AI model that helps your engineers, your designers internally in your company and designing these things or to help them to, you know, gather knowledge and and to to formulate um yourre ah you're youre huntpo your your your yeah manual of the ah your you manual texts, things like that.
01:15:13
Speaker
The model that was pre-trained on, let's say, the internet data will not perform very very well in that domain because because it never source this kind of information that are required in order to formulate good and correct texts in that domain.
01:15:28
Speaker
So here you need to take another approach. you need to to Usually you you could do a fine-tuning run, which is related to post-training. You can train it end-to-end for certain epochs and then you hope you can hope, and usually it works, um that it will pick up to required the missing knowledge and code it into the weights. And then you can then little do a post-training, like this fine-tuning that we talked about, and it it's fine.
01:15:57
Speaker
Or you can do things, and there are very interesting um research also going on is in this low-rank adaption, where you basically put additional weights on your existing model.
01:16:08
Speaker
And then you try to basically add additional capacity to this model, which is not trained before. And then you try to adapt this new capacity two ah you you try to adapt this new capacity to the new knowledge that you would like to implement into your model.
01:16:29
Speaker
also possible right these are i it's very difficult to say which one is the correct way because it really depends on the use case and also the data and also even the model architecture because you for different model architectures you usually take different strategies because um again this is so we are talking about ai and the ai is still heavily reliant on experimentation yeah yeah nobody there's nobody out there that um
01:16:59
Speaker
yeah I would claim nobody's out there that will tell you exactly which kind of architecture for your, for example, LoRa or low-ranked adaption you need in order to fit certain domain knowledge to an existing model.
01:17:12
Speaker
It will be... explored by running experimentation until you find something that works well. And if they knew, I'm sure they wouldn't tell us anyway. Because that's then, yeah, I would say the the the goal at the end of the rainbow, yeah if you know how to do that and you know for every single domain knowledge. But I think that's already super interesting because...
01:17:37
Speaker
I'm sure for me, me using ChatGPT and they doing some quote unquote fine tuning to suit me better, I'm sure they won't do any, any LoRa approach because it is why if this, this guy would already be happy with ah putting maybe an adjusted system prompt in between and he already likes the output a bit more.
01:18:02
Speaker
Well, that's the lowest hanging fruit, right? So why would they bother? There's no need to invest into retraining anything for the sake of one person, right?
01:18:13
Speaker
um But when it comes to, and I think then we're back at to the levels of abstraction. What I thought was quite interesting and when you said the pre-training, yeah, that's these these things learn how to talk, then the knowledge is encoded into them.
01:18:30
Speaker
um And that's the massive compute that you need and all of these examples. It would, let's say you are this company yeah that has that wants to make these turbines and wants to create an AI that helps your engineers.
01:18:44
Speaker
um And you're not a traditional AI company, let's say. It would... costs so much money, I think, two or to do the, let's say, blank ah pre-training um on top of a base model because, well, maybe something goes wrong.
01:19:05
Speaker
but right As you said, with the experimentation, and maybe you get a a good clean slate, ah that already has a very good pre-training scores. And then you add your domain knowledge and readjust the weights and then you fuck it up.
01:19:19
Speaker
And then, yeah, you spend a lot of money. So I think it's already quite interesting that you say, okay, you take this as a possibility. You take this thing and you just add another layer.
01:19:33
Speaker
And only this layer is being then adjusted. The base, that right? that's That's what Laura does, right? Okay. Okay. And are there any, and I guess also for that, right, there are dimensions of complexity into, well, maybe I had one layer, maybe I had two layers, and maybe I had a hundred layers.
01:19:54
Speaker
There's also, there's no best practices here or is there? I mean, the it's it's hard to say. So your best practices in a general way, yes, but again, every...
01:20:09
Speaker
new approach or domain that you want to realize with LoRa requires a certain extent of experimentation. So you need to try things out exactly like you said. This is thiss called like hyperparameter experimentation, right? How many layers do I add there?
01:20:24
Speaker
How how ah broad is my LoRa layer? um What do... so Can I... or what happens what happens to the original models. So for example, there's a very very nice paper done by Google Research.
01:20:40
Speaker
They actually analyzed different approaches on how to add knowledge onto models that were already trained, that were that were trained with pre-training.
01:20:53
Speaker
And these models, they tend to they they adapt to the new knowledge to some extent, but the amount of hallucination increases. Which means, and there's no real experiment there's no real explanation as far as I know yet why this exactly happens, but it has to do something with your your loss surface, right? yeah So you you're trying to artificially...
01:21:18
Speaker
bend your loss surface to adapt to this new domain, but then something happens in this multidimensional space and you you're not really sure what exactly, and the model starts to to you know to hallucinate more.
01:21:33
Speaker
in other domains and this is something that is very difficult to to control right because interesting yeah because then you you have to on the one hand you measure the adaption of the model to your new domain which is fine because you have your data set and you can test actually or hopefully you can test how well it performs on this new knowledge that you implement to the model on the other hand you need to take care of verifying that If there are other domains that you expect this model to perform well, you need to so check if they're still performing that well there as well.
01:22:09
Speaker
Or if it if if the performance had dropped due to hallucination or due to whatever other reasons there might there might be. um It's difficult. i can use so One thing that ah I'm for sure, it's the whole pre-training thing, the whole encoding of general information into these huge models,
01:22:28
Speaker
This is really complex. And by the way, there's a very interesting research going on, which is still at the beginning. And I hope that more young researchers maybe pick that up is basically the merging of knowledge of different models.
01:22:43
Speaker
So for example, that I can tell you one one example that was shown. I just forgot who was the author of the paper, but it can be found in... It it was in Europe's paper.
01:22:56
Speaker
So basically, they trained the model
01:23:01
Speaker
that should just just a base model right without any restrictions it should just answer things and it was not well aligned it was somehow you know outputting so using bad language doing racist stuff you know the things that you don't want in your model this was the one model and then they trained another model which was doing, explicitly trained on being bad, like outputting, like really on on on a hurtful language, on racist language, on all of this kind of stuff.
01:23:36
Speaker
And then they took, they were both the same architecture and then they just subtracted, it's a simple subtraction of one weights, the matrixes here from the others, and then the model that was basically not aligned became aligned.

Innovative AI Techniques

01:23:51
Speaker
So they subtracted the bad behavior out of the model by simple ah by a simple subtraction operation. You're joking. It's really, it it is in Europe's paper. What? It is crazy because, you know, you're still we're still exploring that's great the and we are still exploring the and the surface that loss surface
01:24:21
Speaker
And we try to find and minimum on this loss surface that is not only the minimum from the perspective of the model, because the model has a perfectly fine minimum if it just predicts the next token quite well, right? But we would like to have to find loss surface that is...
01:24:40
Speaker
aligning with our preferences. And somehow this puts us into a surface domain or surface area where we can just play around with this kind of you know linear operations and then something happens there.
01:24:54
Speaker
That's crazy. That is crazy. I need to read that paper. Yeah, I also need to double check this paper, but this is this was in Europe's paper, which is which was published, I think, this year on Europe's at the conference.
01:25:08
Speaker
Hey, just a quick interjection and FYI. After the episode, we had a quick look and the paper Alex is referring to is a nature paper called Learning from Models Beyond Fine Tuning.
01:25:19
Speaker
In that paper, they are referring to another conference paper, which is called Editing Models with Task Arithmetic. You can find the details in the show notes on the website. Now, back to the episode.
01:25:32
Speaker
But like I will also double check it again. but It's amazing. It's amazing. amazing yeah it It just reminds me of so its simple neuroscience research doing, let's say, neuroimaging using fMRI, where you basically... i mean, this is... You do...
01:25:48
Speaker
You always try to create a contrast, right? To to see one experimental condition, the other experimental condition. And the difference between those will give you the activated areas of what is actually involved in your manipulation.
01:26:03
Speaker
That's why I was like, I just blow it away. yeah it's Brings us full circle back to how we started, which is... ah holy shit that is and the thing is again ai is still a very very young science yeah yeah and a lot of things there are just currently explored right also by experimentation yeah um and usually the the real fundamental theoretical
01:26:35
Speaker
building blocks that are that are underlying all of this or that are basically it should should explain all of this or put it at least into a coherent theory, um they follow usually, right? They they follow the experimentation.
01:26:51
Speaker
Yeah. And yeah, it's I mean, it's exciting to to be to be in that space because it's like,
01:27:00
Speaker
It seems like a lot of things are possible. Yeah. And it's very difficult to explain certain things. For sure. i think that's, uh, yeah, that's this, it still has this, this magic factor, yeah even to the people that work in that space. Definitely. And, uh, that's the formulas all day, every day, but that's, whoa.
01:27:20
Speaker
And this, this, what you just said, this blew my mind. I really need to read that paper. This is insane. Um, wow. Whoa. Okay.
01:27:32
Speaker
it coming Coming back to to the fine-tuning aspects. I mean, what you just mentioned. um Assuming what you just said, let's say you, depending, of course, on the problem you want to solve, different levels of fine-tuning are required.
01:27:53
Speaker
Is there... just for maybe a point of reference to the, to the engineers out there that want to get into it.
01:28:03
Speaker
Each level, as far as I understood you now is also associated with a different level of complexity and things that you might need to take care of. But I want to bring this back to, to the platform and your project that you want to work on.
01:28:21
Speaker
What would this mean in terms of, let's say,
01:28:27
Speaker
compute costs right because different levels of fine-tuning require different levels of complexity and that different levels of complexity require different levels of computational power or resources or am I getting this wrong yes definitely yeah So I would assume that something that just, i would I would assume that LoRa approach would certainly be cheaper than to adjust the base model directly.
01:29:00
Speaker
I would assume. Yeah. Or, ah but but might have the downfall of, hey, maybe the thing is hallucinating a bit much, but I'm willing to take that risk or that chance. But that's then a business decision led down the road as for some other people to decide. But for the engineers, which one would you say...
01:29:21
Speaker
is the cheapest to go for? Maybe what's the lowest hanging fruit? You should try maybe first. And is each different approach um associated with a different level of architectural complexity?
01:29:38
Speaker
So I mean, let's say I want to do the fine tuning of the base model. Does that mean i have to have my own data center? Or is if I want to do the LoRa approach, I can get away with hooking 50 computers together at home and hope for the best.
01:29:58
Speaker
You know, but I assume that different levels of of complexity also a criir require a different level of infrastructure complexity.
01:30:08
Speaker
yeah Or is that wrong? yes definitely i mean i that's i would say in general that this is this is true um the thing is the starting point right um if i would be an engineer and i would have a problem at hand yeah the first thing that i would try is to probably solve it with the existing stuff yes right yeah if it doesn't work I would start to tweak with the prompting.
01:30:32
Speaker
yeah If it doesn't work, I would go with something like preference fine tuning, like SFT or RLHF, DPO, um these kind of things. if Because this is fairly cheap. though So this is...
01:30:45
Speaker
but When you distribute it, or it is it can be parallelized quite well. You can distribute it on on on on spot instances. So there, if you have a few hundred bucks available um for for a company, usually it's it's not ah too much of a problem, then you can just try it.
01:30:59
Speaker
The SFT, you mean? Yeah, exactly. And that doesn't doesn't need that many examples. No, no, exactly. So you don't you don't need a huge data set for that. um So this what basically, but basically it's ah it's a matter of showing the model example outputs that you expect from it, right? How it should be formatted.
01:31:20
Speaker
So please don't use Smiley, for example. Oh, yeah, yeah, yeah. This kind of thing. I mean, at the moment, GPT 4.5 is suffering from this Smiley disease. ah Oh, yeah, it's amazing.
01:31:31
Speaker
There's too many emoticons or icons in general. Definitely. So this might be something, right, that you can just, you know, with SFT, with DPO, you can you can remove this behavior or you can teach the model to not behave that way.
01:31:44
Speaker
um If you have a very specific domain, then I would next step go with the LoRa approach. You try to experiment with LoRa. It would be more more expensive, but it's still fairly okay. that's so The costs would be would be still manageable, I think.
01:31:59
Speaker
um from from a perspective on this is i would say considering the fact that you don't need to deal with the whole infrastructure setup yes because if you need to deal with the infrastructure setup on hyper uh on the on ah on ah some kind of a cloud provider or or you have your gpus to yourself and you need to manage to connect them somehow right so things there or setting up your ray platform and You need to keep track of the development of Ray in order to ah to to provide your engineers with the best working environment.
01:32:32
Speaker
This is what requires you to hire one or two engineers that are doing full-time exactly

AI Cost and Complexity

01:32:38
Speaker
that. um So this is this is a little bit more complex. thing And even then, i would argue that they might set up a good infrastructure, but still...
01:32:50
Speaker
it's it's very difficult to find always the best cost to performance ratio right sure so the market changes so fast yeah today it's like 40 cents an hour for an h 100 somewhere the next day it's 1.50 and the prices are fluctuating um and for laura sorry to interrupt but yeah laura i assume i need a bit more data A bit more data would be, yeah, exactly. And there the the type of data is a little bit different, right? So you need to data set and this data set needs to be quite curated, high quality. So you need to exactly know what kind of knowledge you're currently encoding into the model.
01:33:30
Speaker
And it's not only the this, you it would be the the best approach there would be not to take data from your specific domain, but mix in other domains as well in order to help the model to you know to reduce the this this ah hallucination later on, where it overfits on your domain. This is not what you want, right? You want still to strike a good balance between the domains that the model knows.
01:33:57
Speaker
um So it would be a little bit more complicated to set up in dataset like that. And for the last resort, and I mean, you need to think really carefully if you really want to do this,
01:34:08
Speaker
To have this kind of end-to-end fine-tuning all the um all the weights of a model in order to fit your domain, and then you need a very good and well-designed data set to do a dataset to do that.
01:34:23
Speaker
Yeah. And then we are at the um the pre-training, basically adjustment. Yeah. at the full The full model adjustment. And just because you start to do that, you have to do post training again just to make sure that thing is actually still outputting something useful. Yes, exactly. Combining this lower approach and this end-to-end fine-tuning approach with reasonable ah post training.
01:34:50
Speaker
it makes sense yes it typically makes sense okay well i can already uh if this was my wallet i already know what i would try first yeah oh wow and but again ah just maybe one word regarding that we're I assume that we will see ah drop in prices by 10x, maybe even more during this year, during the next year. So right now we're talking about, or it's still a matter of, let's say, a matter of of consideration of whether I want to pay this kind of money in order to ah to to to realize this this this training procedures, right?
01:35:32
Speaker
But later on, I think it will be the discussion that is more related to time. How fast am I able to set up this kind of infrastructure that allows me to do that?
01:35:42
Speaker
yeah Because the compute will be cheaper and cheaper and cheaper, but then the cost perspective will change from the compute cost more to the complexity, time, and the money or the the cost that is as you associated with hiring an expert team and setting up your your own specific infrastructure compared to just using a platform that provides you everything out of the box yeah yeah exactly i mean as you said at the beginning right the the hardware is always lagging a bit behind the software yeah and now i mean data centers are built left and right it's insane exactly and not only data centers for yeah that was
01:36:25
Speaker
the cloud first and now there is, i mean, these, the data centers running AI models need to have very different specifications than the ones ah just running cloud systems. yeah Very different. absolutely Very, very different. um Not only when it comes to the GPUs, but their interconnectedness and the power searches and all of that stuff. yeah We can go into that in a little bit.
01:36:52
Speaker
um What I find interesting is... um
01:36:58
Speaker
Based on this, let's say, the necessary, what we can call this, the logical hierarchy of how you should think about training your model or if you should do it in the first place of what you just said.
01:37:11
Speaker
I find it quite interesting um that I think for most people, things that the average person needs, they don't need to do fine tuning anything.
01:37:24
Speaker
yeah ah The normal person, like some doing some better prompts to get the model to behave in a certain way or the output be in a certain way that's more preferable to you,
01:37:36
Speaker
you'll get there. Yeah. Yeah. That, that'd be fine. But for other things, I think it's now super interesting to see which logical steps and complexity should follow.
01:37:48
Speaker
And, uh, it also makes sense from now. come Now I understand a bit better from a company perspective, you make the turbines, right. To understand, okay, yeah, I don't think prompting will cut it. Uh, I can already tell this myself and I know nothing about turbines. Yeah, and I mean, the the the example that you set is also perfect because I know ah bunch of startups that are doing it exactly like you said. They set up different prompts in order to automate their process yeah and make it compliant to the AI model. Yes. and you know The other way would be to just take this process, collect a few data points, and then fine-tune your own model. and then
01:38:26
Speaker
but Because you know right now, whenever they update the version of this base model, the the API model, yeah they need to revisit their whole process graph, and they need to fine-tune again each prompt exactly in order to make it again compliant with...
01:38:41
Speaker
this model, which is difficult. I mean, that's that's added complexity and it goes a little bit ah against this kind of, you know, the bitter lessons paper, um which is, you yeah, which basically says that just use the most simplest approach.
01:38:59
Speaker
Yeah. collect enough data, scale your model enough, which is typically the case for the models out there right now, right? Yeah. um And then it will work much better than something that you engineered on top of it.
01:39:12
Speaker
Exactly. Exactly. And this's this is what I wanted to get to. It's now its it's a perfect example is that You, because yes, hardware is lagging a bit behind, but now it's catching up so fast that now um the software people, I'd say the the model engineers,
01:39:31
Speaker
um would have the ability to um ah train models and all of that stuff, but they don't have the data. But now you have the models to create the data for you. So you, which is, i mean, also brings us back to the DeepSeek thing, I guess, because ah I thought it was hilarious that they, that OpenAI i complained that it looked like DeepSeek used OpenAI to generate training data, which thought was hilarious.
01:39:59
Speaker
But i The models that have been developed, and and I think this is super important for people to understand, is we don't have to start from scratch again. And and then there might be some, I guess, I don't know what you would call this, cannibalism? i don't i don't know. it goes a bit in circles because...
01:40:25
Speaker
On the one hand, I think this is great because this means we can build on top of the other things, right? The layer, you get another layer of abstraction and that allows you to build another layer of abstraction on top.
01:40:38
Speaker
And now you can do it even faster because of the level of abstraction before it, right? Just as... right Nobody has to code assembly anymore because now we can do it in Python and many, many, many layers in between.
01:40:52
Speaker
And soon nobody has to code Python anymore because you can just tell the LLM like, hey, make me a code, make me some code. And it, damn, it does it. um It even tests it for you before it gives it to you. Yeah.
01:41:05
Speaker
Level of abstraction and abstraction and speed increase. But that also means that we need to hope, I guess, and make sure and that might be a point of criticism that the level of abstraction that you build on is solid.
01:41:23
Speaker
right Because if you use it to get one up, or everybody uses it to get one up, need to make sure that it is solid. Just like you build, you you always should build a house on a solid ground and not on shaky ground.
01:41:38
Speaker
And that's why I really liked your example.

Role of Open Source in AI

01:41:41
Speaker
Build the system that will get you your desired output Use the output as then training data for then creating your own model on top of the base.
01:41:52
Speaker
Yeah. Which is, yeah, and that's, I've seen many people try to do this now and it's crazy. Just as you can ask your LLM to, hey, I would like to have this type of output, um write me a prompt that would get me that. Yeah. And it just does it. Yeah. And it's, it's, it's, it's It's almost like, how do I need to talk to you so you will do what I ask you and it will tell you?
01:42:17
Speaker
Exactly. yeah I mean, i I think it's really hard ah for me, at least. I'm not that experienced in prompting image models. Yeah. I use ChatGP to just describe something that I have on my mind and then just please provide me a prompt for, don't know, Flux or something.
01:42:33
Speaker
And then it just does it. It's a really nice prompt. Yeah. What I am I came across this. I hate writing documentation. I know it's important, right? We all know it's important, but I just, in the flow while you work, also while you're experimenting, it's so difficult because the documentation is supposed to be clean, right? At this point, one, this is point two.
01:42:55
Speaker
While you're working things out, that's not how the normal workflow goes. So I see all my notes and then it's like, man, I have to write some documentation now, damn it. And then you read through it and think this This makes no sense. And then you need to read to your code again.
01:43:10
Speaker
So what I did was I just threw all my notes in ChatGPT, threw my code in that was working, the the run the final code i was working. I said, okay, this is this is that these are my notes. This is my code.
01:43:22
Speaker
I need to write documentation for it. Before you write documentation, please make me flowchart of this system that was finished in a mermaid flowchart, which is a coding way of how you can make flowcharts basically.
01:43:42
Speaker
It put something out, I threw it in the editor and this was actually how this was working. I was blown away. This was...
01:43:54
Speaker
I just couldn't but this was one of those magic moments, right? Where you just think, what did just happen? Because the if you give this to a normal human, understandably, they would first yell at you because your entire, how you throw things down is just all over the place, understandably.
01:44:12
Speaker
um And then they would have to look through your code, understand the conventions and all. And this damn thing just spit it out. And I was i was amazed.
01:44:23
Speaker
So this is what I meant with the layers of abstraction and abstraction. yeah And I think that's important too to make sure that these base models are good. And at the moment, and and correct me if I'm wrong, because I don't know that much about the large language-based models.
01:44:39
Speaker
I think currently they're mainly from the big companies, right? And no no other ones. ah there any Are there any German or European ones? Yeah, I mean, you're basically right. um That's true.
01:44:53
Speaker
ah German, European one. I mean, we had one, which was this Luminous model by LF Alpha. Yeah. But they closed Source now after they got their investment. It's it's like they're clothes they're doing stuff, but nobody knows what. So they basically abandoned this kind of building in the open thing.
01:45:12
Speaker
But still, I mean, probably they're they're doing nice things internally. um the other thing is uh mistral right yes we have an we have a european based model which is much smaller than than any other model out there from the big labs but still something um then we have china catching up so they have their deep seek models which are really competitive yeah um and i should also add uh they're also open weight, right? that's I think that's compared to Lama, for example, from Facebook.
01:45:47
Speaker
they You can use those, sure, but they're not open weight, right? Yeah, I mean, the Lama, I think, they or yeah it's probably a matter of of of of i of definition on what exactly is open weight. I mean, they're available, the weights of Lama are available, but they have a license that restricts you to certain...
01:46:10
Speaker
things when you use it commercially so i think it was something i'm not sure so this is but as far as i remember it is something like if you have more or if you build your app based on llama and you have more paying users than x you need to state that it's built with llama with llama something like that yeah okay okay But it's ah yeah yeah in that with LAMA you you can access the weights actually. okay the the um the advantage or the the
01:46:42
Speaker
what Why people actually love the DeepSeq models is that they not only open source the weights, but they write really, really detailed technical reports on what they actually did in order to achieve this.
01:46:58
Speaker
And I think last week they they just launched a week of releases where they open sourced every day for seven days, every day one repository, which was a small ah implementation of a certain technology that they use in order to train their models.
01:47:14
Speaker
and Which was nice. They have a really, really nice data so ah file system that they implemented just for improving the efficiency of of model training.
01:47:26
Speaker
It was really cool. Yeah, that's cool. That sounds really cool. And the open source community likes them because, you know, they're open. this is Yes, yes. I mean, i think i think that is so cool and why I still think open source yeah is the way to go on many things, not on everything. I understand why companies want to keep their IP. I get it. I get it.
01:47:52
Speaker
But if you look at, I always like to give this example of FFmpeg. right it's it's open source and it's mostly at the core of many many many um ah video editing software yeah maybe just just one one thing that i find very interesting about ai and open source i think we There's a lot of discussion regarding open sourcing models, open sourcing code there, and I agree that this discussion... it is important to to have this discussion, but something that... you know it it is not quite the same.
01:48:28
Speaker
Open sourcing AI models not quite the same as open sourcing software, because as you said, when you open source code, then people can build on top of it. It's like incremental improvement. This is why open source code is so powerful. yeah you can Linux, right?
01:48:44
Speaker
So Android back then it was when it was really open. yeah you You can basically incrementally build on top of something amazing that people did already. yeah And you can just add your features. You can tweak it a little bit to your use cases.
01:48:58
Speaker
This is really nice, but this is not how things work with AI in general, right? So it's very difficult to take a base model and to build upon this base model.
01:49:10
Speaker
because it's still expensive to tweak the space model to your requirements and yeah that's true that's true but coming sorry for the short diversion i found this was quite interesting do you then think um when you you said at the beginning you know uh we humans like you know customization and things being adjusted to us do you then think fine tuning is the way forward a Good question. I think some form of fine-tuning would be the way to go forward. um i'm not I'm not sure exactly what will be the case. There are certain companies that are, for example, betting on distillation.
01:49:48
Speaker
Certain companies betting on reinforcement learning. Some certain companies betting on this kind of post-training, the fine-tuning stuff that's happening there. I'm not sure.
01:50:00
Speaker
I don't know. I would like to build a tool set that supports the efforts that researchers out there might have in order to improve the models to their requirements.
01:50:13
Speaker
And then how are you, and this might sound super abstract, I was trying to get it out of out of you. how How do you envision this platform that you that you want to build? Because to me, it's, and maybe I don't see the forest before the trees. you know just All I see is, oh my God, I have to provision this cloud instance. I have to do this. I have to tie this together.
01:50:36
Speaker
Oh my God, I forgot to ah turn it off. So I have to pay. I lost rent. you know and Just like ah as an example, that's how AWS mostly goes. You forget to put something, you turn something off and then you see your bill at the end of the month. 3k or something. 3k and you just... You write to support, please. Yeah, exactly. Please, please, I made a mistake. and but But they usually you just yeah they just drop. They're usually nice. Yeah, they're usually nice. That's okay. That's true. But I still have to suffer from this initial shock seeing the bill. Yes, that's
01:51:11
Speaker
And so you have that. And at the backside of all of this, you have you also need to... So to me, this is just like insanely complicated because somehow you have to match... And let let's just talk about energy prices for for now.
01:51:31
Speaker
You have to match energy. energy prices to the um ah computational resources, while at the same time leveling that with the availability of the computational resources, which is not directly tied to the energy prices. yeah it it should be, but it it isn't always.
01:51:53
Speaker
and Because now also the data centers, the big ones that are building their own power plants right next door for energy for at least AI development or inference um that's also happening.
01:52:06
Speaker
How? How? yeah How do you do that?

AI Compute Distribution

01:52:10
Speaker
Yeah, I mean, i think I would just maybe do make two points here, right? Yeah. from Let's take the look from the from the user perspective as ah if i as I see it.
01:52:21
Speaker
user for us is generally and a person that is developing an AI model. Yes. So these people are typically used to work in certain environments. It could be Jupyter Notebook, it could be a Python script. yeah That's usually it, right? um And I would like to um to to integrate very well into existing IDEs, for example, like Cursor or VS Code. yeah yeah And then instead of you know doing all of this very, very complicated infrastructure stuff, you just install our plugin.
01:52:55
Speaker
You just click run. We offer you prices that are currently available. You select one and then let's go. And this is basically the layer of abstraction that I see for for the user facing.
01:53:07
Speaker
And all of the stuff that happens and back behind behind behind this abstraction layer, which is scanning the market, taking place of this calculation, energy, availability, all of this stuff. This like algorithms that we... that we um This is basically get the added value from us Yeah, yeah, exactly exactly. This is one way to look at it. So this is what I would like. you One click...
01:53:29
Speaker
running AI workload training. We focus on training. um One part. Second part is how do I see the whole value chain?
01:53:40
Speaker
yeah The value chain is very interesting to see because um you have at the end, it's like a very classical market. In typical markets, they consist of three things, or two, where general things is supply and demand, right?
01:53:58
Speaker
And there's one thing in between, which is distribution. yeah So you have the suppliers, yeah you have the suppliers that these are the providers, or these are the the people that are actually running the data sets and they providing remote access to their GPU infrastructure.
01:54:12
Speaker
This is our supply layer. we just We don't go below that. So we don't care about how they get their chips or how they manage the data centers. It's fine. um They just provide APIs, which allows people to access the compute that they need in order to run their workload.
01:54:28
Speaker
This is the abstraction layer. this is the We call it compute suppliers. And then we have the demand. the demand is AI teams, companies, researchers that are actually actually in need of compute in order to train their models.
01:54:45
Speaker
And we would like to establish ourselves as in between, as a distribution layer. yeah it's very Usually it's very difficult to do because this is something like a marketplace, right? So you could take it an example of Uber, right? There are very powerful companies that are actually doing distribution. Let's take Uber as an example. yeah In Uber, you have the supply.
01:55:05
Speaker
This is the mobility, the drivers, right? They provide the good mobility. Right. And you have the demand, which are users, for example, me after a night in Berlin, I need some kind of mobility in order to get home, right?
01:55:23
Speaker
So what do I use? I use Uber as a distribution layer in order to find the best, ah the best The match. Offer? Yeah, the best match. At the current market, right? And they take care all of this complex stuff.
01:55:35
Speaker
They take care of ah routing, optimal routing of the of the of the drivers and do some kind of locality optimization, price optimization, demand supply optimization. I don't care. I just want to go home.
01:55:47
Speaker
um And this is something, to put this and analogy in ah to our context, somebody needs access to compute. Yeah.
01:55:58
Speaker
And there's a huge and complex market with all of this complexity that you mentioned, like energy stuff and all of this. And we're the distribution layer that just abstracts away all of this complexity and provides the goods that a user or the demand is actually needs in order to access the things that they need to run their AI workload. And why I think that...
01:56:17
Speaker
might work is because we're focusing or we put our focus very sharply on AI model training which is typically to some extent it's very homogeneous um workload there you know it's it's it can be parallelized it has some common properties that we can use in order to build some general approach, the framework approach there.
01:56:43
Speaker
Just to provide an example, this idea is not entirely new, right? So for example, we had grid computing few years ago, you remember? Yeah, I remember. And it failed. More massively. It failed massively. But why? Because there were supply, there were demand, and the grid should, there were grid platforms that should actually provide the distribution.
01:57:03
Speaker
And it failed because you had very, it's two things. You have very strict SLAs, for usually for the apps, for the backends, and for the databases, for the mail servers that you want to run on this distributed ah distributed infrastructure.
01:57:19
Speaker
So these SLAs were very difficult to comply with if you have something like where every compute supplier could just drop in and then, you know, just kill your app.
01:57:30
Speaker
um This is one thing. And the second thing is the heterogeneity of the workload. Yeah. Every... company, they had some specific, some very specific business logic, some very specific database requirements. All of this were too specific to set up a framework that can basically serve all of them equally well.
01:57:50
Speaker
And this is learning, I think, from grid computing that we can take and we can apply it here in order to focus on one specific workload and do this very, very well and abstract it away from the user side.
01:58:04
Speaker
Does that then mean, so I'm thinking now, okay, this is um taking ah your Uber example. Does that mean that I, as a as um data center, can hook up into your system um so that you have this offering?
01:58:23
Speaker
And I, as a person who just has a spare GPU at home and my computer is running, can also hook up? It would be nice to have this. This is not our first focus, but from a vision perspective, that would be nice. i can For example, I can totally see a future where you can you know you have a house, you have a basement, you just put a rack there, yeah you put your solar panel on your roof, and then you can just do a passive income based on renting out this GPU compute to via some marketplace. but Yeah.
01:58:53
Speaker
could be ours or any other yeah um yes this is this is definitely something that i that i can see obviously right now it's not the main focus of us but um we i hope that we will be able to provide something like that in in future regarding the suppliers that can hook up with our platform yes it's i hope that we will be in the position soon in order to you know negotiate better prices with them in order to provide better prices to our users however at the moment we're just using their apis and hook up with them and this is i mean the the most difficult things thing from a distribution perspective is that you need to s serve both sides right yeah you need to make the users happy and you need to make the suppliers happy yeah in our case it's not like it's not like that we just hook up with the existing apis so the demand at the supply side it's fine it can be automated we're just providing value but by
01:59:49
Speaker
getting an overview of all the existing APIs out there that provide your on-demand GPUs. This is the first step. And we can purely focus on the user side and provide a very good user experience.
02:00:01
Speaker
um And I hope that we will be able to drive, based on that, if we if we do a good job with that, I hope that we will get some mode, right some user base. And this will provide us some kind of a leverage in order to approach ah certain providers that, in the first step, probably that won't be the hypervisors, but smaller.
02:00:17
Speaker
um companies that are popping up like also in Europe, yeah in the US, and also and in in in Asia. yeah Small GPU ah compute providers building small data centers, and probably we can we can hook up with them first.
02:00:32
Speaker
Yeah, yeah, yeah. Yeah, definitely. And I mean, you're right. As the man in the middle, you have to make both sides happy ah if you're the connector, basically.
02:00:43
Speaker
And I think it's quite interesting already from the user perspective. Yes, you can do all these things already yourself. yeah It's just so cumbersome. Exactly. And nobody...
02:00:56
Speaker
Once, at least I haven't seen anyone. i never met one single AI developer that says, hey, let's do infrastructure. Exactly. ah But also what I find funny, there is not a single developer who understands that they don't want to do it and then just takes a step back and then makes a one-click automation for themselves.
02:01:19
Speaker
They're not doing that either. Which is just like, ah, it's so painful. It is painful. It's so painful. And if you, I think if you make them already happy, which can then increase,
02:01:30
Speaker
um the use case, right? mean, it it it will increase ah the accessibility. The friction is gone, right? that's That's what's the important thing. The friction is gone for the user side and which will then increase the model training, right? I mean, that's the funny thing about the paradox of efficiency, right? yeah you make seventeen paradox yeah yeah You make something more efficient and you think, okay, usage will stay the same. No, no, no.
02:01:56
Speaker
It goes up. The demand rises, exactly. And it's going to be the same thing here, I guess. You're going to make it easier for the end user and already hooking things up with existing APIs, which is good, right? I totally understand.
02:02:15
Speaker
which then will show hopefully the suppliers like, look, this is we bring you this. so We can bring you more. We can bring you more. You know that. So how about you, well, you drop the rate limiting for us or things like this. We'll get a better deal.
02:02:33
Speaker
And what I think is also interesting is you mentioned the small data centers. Yeah. I know there are a lot of companies in Germany, a lot of um software development companies that have invested heavily in GPUs and they are not using them. they they They just sit there and they have their entire infrastructure set up in house but they don't really know what to do with it and they're just sitting there which is very sad yeah i agree i agree definitely it's it's all it's it's in companies it's the case and but it's also for example in universities in some university yeah case i mean we said for example to berlin where where i was doing my phd we had an we we had a cluster and and um for the for of the faculty cluster
02:03:25
Speaker
yeah It was built up. It was really nice. Great access. um you You have really powerful GPUs there. And it was sitting idle for months because actually nobody either... The people were aware that it's there, but it was still too complex to access it.
02:03:43
Speaker
Or and they just didn't know that something like that existed. And now after after a certain period of time, you know... load started to roll in but still you see spikes yeah you see periods in time usually before very uh very uh impactful ai conferences take place yes you have like this kind of spikes them and then for for example for the whole summer time it's it's sitting idle it's It's also not not not the optimal way, right? No. To do it. And it would be great, in my opinion, it would be also great to utilize this kind of spare resources and also to allow order to to integrate them in a way that whenever something internally rolls in, right, you just drop our workload. Yeah. um And you use it for your stuff. But it's when when you have spare resources, just contribute them or just generate a little bit of passive revenue for um for for your company.
02:04:38
Speaker
Yeah, yeah, yeah, definitely. I mean, this reminds me of when I was ah ah doing my bachelor's, I was in the final, I was doing my bachelor's thesis and um ah we were running things, said certain type of data analysis on the laptop and I already had back then a MacBook Pro and I thought this was running fast. It still took forever.
02:04:57
Speaker
and so our department together um worked to have access to certain computers in in that research department available so that you could remotely log in, load your scripts up and run tasks there. So that's what I did, did that manually and well, it was running.
02:05:22
Speaker
Little did I know one time was running things on my supervisor's computer. I didn't know was his computer. It's just, I had this, you know, it just has this ID and like, wow, just... okay, I have this task, I have to run this analysis 16 times, so let's find 16 computers yeah instead of you know doing it one by one. and then of course But they can see if their computer runs low, what's going on and who's running stuff.
02:05:45
Speaker
yeah sorry He wrote me an email and said, could you please... cancel this because i can't even write a paper anymore. It's kind of funny. um talking But that's what i what what you mean with there are so many resources also at universities that are being underutilized, yeah which is I think very unfortunate because um and in this debate of, I mean, there's this whole debate going on in Germany of, hey, can we catch up? Are we too late? And now all the regulations come in and do we have to build our own data centers now? And data centers are being built in Germany as I speak.
02:06:24
Speaker
A lot. yeah And, well, electricity isn't cheap here at all. um So why are they being built here? Lots of... It might be um and another four hours we can talk about that. yeah But...
02:06:39
Speaker
I think what should be mentioned is that we actually do have a lot of resources. and And then becomes a question and then brings us back.

Geopolitical Implications of AI

02:06:51
Speaker
Do we have to compete with, let's say, OpenAI? i right Because which company, I mean, at least I don't know how much money Aleph Alpha got, but I guess...
02:07:06
Speaker
I would assume it wasn't as much as Microsoft just, let's say, on a third round of investing or second round of investing, just dropped out of their pocket change. That's how it felt into OpenAI's bank account, where I thought, this is insane.
02:07:24
Speaker
And they were just like, yeah, here's 10 billion. What? What just happened? um So I find this already quite interesting in our conversation of where in the development of the systems can we still get a lot of gain while at the same time do we within Europe have to start from scratch or should we, if we start from scratch, invest in a specific ah company that will actually get us there so we can build on top of that?
02:07:58
Speaker
And then also which which systems to use. I find this super interesting. I mean, yes, I agree with you. Nobody wants to talk about infrastructure. I think it's annoying and people find it boring.
02:08:10
Speaker
I always like to say, you know, and as soon as you notice you you have a lack of infrastructure, it's most of the time too late. Yeah, that's true. It's like it will take you too much time to catch up.
02:08:25
Speaker
And that's why it's important to keep investing in your infrastructure. Because if you never hear complaints about infrastructure, that's a point that's a good sign. That's really a good sign. I agree. And yeah, digitalization and so on. We can talk about this a lot. Yeah, it's like a huge topic. there Maybe just two thoughts on that. Because I think, from my perspective, catching up is difficult. But I think AI, and you see it also in in in the geopolitical like landscape between the US and China, right? AI seems to be a really, really...
02:08:56
Speaker
strategic technology it has strategic importance on the scale of geopolitics yeah which is insane because i don't know any ah or almost no other any ah yeah do you know any other software technology that has this kind of no impact i mean either so it is yeah usually it's the other way right you know like rockets or yeah this kind of warfare stuff but yeah it's it's never was something like Like a basic software thing.
02:09:26
Speaker
And um this is interesting. And I think if we follow this trajectory there, it is, in my opinion, important to have ah at least the possibility to train state-of-the-art models in Europe.
02:09:40
Speaker
yeah And I think it shouldn't be, let's say, something like a company that is dedicated to earn money with it later because they will never catch up to the to the to the big AI labs, but they don't have to. It is just a strategic way on having something at hand and not falling completely behind. So this is one part.
02:09:58
Speaker
On the other hand, I think from an economic perspective, it doesn't make any sense to catch up. um In my opinion, from an economic perspective, at least I'm not also an economic expert, but I would see or if you if you take it take a look at the value chain, right? Where is the value generated?
02:10:15
Speaker
The value is always generated level. layer And the top layer is always the one that is facing the end user. yeah And the end user in this case is either we, right? yeah'm Chatting with ChatGP, there's one part, but the more value will extract will be extracted from this model on the business layer. So whenever companies start to pick up this technology to implement it and integrate it well into their...
02:10:42
Speaker
business ah processes And to either, I don't know, do things that were not possible before, do things much faster than previously, ah do things of much higher quality. Whenever this happens, this is the unlocking of the actual value of AI. yeah And I think we are not behind here because this is always a matter of understanding the business, understanding the um the the business value chains of different industries and data.
02:11:08
Speaker
yeah data And I think here, in for example, in Germany, German Mittelstand, we have a huge amount of data locked in into certain companies that is just waiting to be unlocked. for It's just you know it's it's it's something that yeah it it will be a lot of effort to do that.
02:11:26
Speaker
But it is something, in my opinion, really worth working towards unlocking it. yeah Yeah, I think so, too. And I think it's we don't need to catch up on this from economic perspective. We need don't need to catch up on the space models. We just take the things that are already out there and we apply them very well on certain industries. And this will put us or it could put us in a position where we can definitely compete.
02:11:53
Speaker
Yeah, I think so too. And I also don't think it's... I don't see the... If we wanted to you know reinvent the wheel here just for our own sake... I don't see this this happening from the political side. i Sometimes I even wonder if they understand what's happening.
02:12:11
Speaker
thank That's one thing. But to be fair, I mean, i think China even didn't really understand the importance of it. I think when you take a look at so before the deep sick moment. Yeah, yeah. So AI was important in China, but not that important. It was wasn't the top priority of the of the upper of the of the of the party there, right? Yeah, that's true.
02:12:33
Speaker
And I mean, the US seemed to estimate this technology to be really transformative and they try everything that they can in order to be ahead of everybody else. Yeah, yeah yeah yeah that's true.
02:12:48
Speaker
And just to to circle back to what you said about the platform, So you want to focus on training first. Does that then mean, i'm so if I'm now looking at, let's say, AWS or Google, if I want to train something there, I played around a little bit with those systems.
02:13:08
Speaker
um I don't think I ever get my model file, right? So I have to, when I train something up there, I have to stay on that system to run the inferences. Mm-hmm. I mean, at the end, um it depends a little bit, I would say. yeah So first of all, um if if you, for example, just run GPUs there, yeah can you can do whatever you want with these GPUs, you run your AI model training, you just you know take this and then it's fine, it's yours.
02:13:36
Speaker
um but when you when you commit to something that is more platform heavy like vertex or sage maker then you need to comply with the apis and the frameworks of these specific solutions and usually you need to adapt your training pipeline to the specifics of these platforms which means when when you implemented the very let's say complex pipeline of your of your model training, then it's not straightforward to just switch quickly to another platform or to a third platform. And this is, I think there there' there's certain kind of a friction that exists there.
02:14:14
Speaker
Companies, when they commit to, let's say, Vertex, they usually stay there and ah it's expensive. it is it is expensive if you would if you if for example if you if you set up if you you you can rent gpus install your your stack your training stack there and set up the training on on the rented gpus bare metal nodes or infrastructure as a service um you usually can go cheaper by factors of six to eight that much yeah that's insane
02:14:48
Speaker
And this is this is, I would say, the the margin that these companies have. like if you take If you take a look, for example, at at the Azure financial report, they are operating at 50% margin.
02:15:02
Speaker
Like 50% is a lot, right? It's a lot. Your 50% of this money is just to cover your costs, right? And the other 50%, it's your profit.
02:15:12
Speaker
And I think every every industry that operates on this kind of margin is utilizing ah utilizing um economical inefficiencies of the market in order to bend the market rules towards your advantage. yeah It is fine to do that as long as you are are ahead of the game, right? yeah But at some point it will backfire Because your whole business is set up on these margins. Your investors are investing because you have these margins.
02:15:41
Speaker
But if you're not able you're not able to defend them for an extended period of time, it will be hard for you to yeah to stay in business because there will be new players. and I mean, there's there's a quote from Jeff Bezos, right? He said when he founded Amazon, he said, he said your margin is my opportunity.
02:16:03
Speaker
Yes. And that's exactly, I think, what happens in the compute market. And I think there will be some disruption happening happening there. I mean, it's because it it starts at some point, it becomes a race to the bottom. Yeah. Because at the day, maybe I'm okay with 40% margin. Yeah.
02:16:18
Speaker
And the other one is 30%. And we saw this happening in telecommunications as well. True. Yeah. Where nobody wanted to talk about this when they put these fiber optics up and the throughput, the bandwidth was massive. And I said, ah yeah, we can keep just charging more.
02:16:33
Speaker
This makes no sense. Why why do you think that? yeah Yeah, well, because throughput's just going to increase. Yeah, but there's no bottleneck anymore. So it it makes...
02:16:45
Speaker
Economically makes no sense what you just said, right? and yeah And then it just took a little bit, like five more years, and then that's exactly what happened. Exactly. now that's um
02:16:59
Speaker
I didn't know the margins are that large. Yeah, that's the cloud margins are really high. Man, that's insane. That is insane. So I'm curious then...
02:17:11
Speaker
When it comes to, can we, well, it's like to regulate stuff in Europe, right? In Germany especially. um I'm wondering because, for example,
02:17:23
Speaker
you saw so much happening on the consumer side when it came to avoiding vendor lock-in. yeah they For example, when it comes to smartphones, they do they try to do everything so that Apple cannot keep forcing people to stay in their ecosystem. They want to allow people to have an easier time switching, for example, to Android.
02:17:46
Speaker
yeah They forced them to to do do the USB-C and Apple was upset. Wow. why Why? For whatever reason, it doesn't really matter. You know, like those types of things.
02:17:57
Speaker
Now, then then at one point it became about the Apple Watch, right? Or other smartwatch manufacturers being locked out of the functionality within iOS, for example. So they want to make that they want to create fairer market where you can, as an end user, have an easier time actually deciding who you want to go for.
02:18:19
Speaker
yeah you can You can decide, okay, I want to buy now from this person or you want to buy from this person, right? Then to make at least the transition a bit easier. Now I'm wondering if the same is going to happen for cloud,
02:18:35
Speaker
Because i I see a few signs where that is happening, where they don't want you to have such a hard time switching maybe between ah Google Cloud Platform or AWS, or let's rather say the other way around because AWS is the massive one.
02:18:52
Speaker
ah Google Cloud tries everything to get the big business customers over or Azure, right? That you have an easier time switching. I mean, I've seen some attempts having, you know, infrastructure as code and things like this, but it's not very, it's not seamless.
02:19:11
Speaker
So I'm wondering if it's going to happen there. I've seen it happening a bit more, like things moving in that direction, but of course slow because, hey, who wants to give up their 50% margin? Nobody.
02:19:22
Speaker
no no Nobody wants to do that. I'm wondering if the market is just going to force it. And then I'm wondering if sooner or later, we're also going to see that potentially with um AI inference that you have your model. And at one point you can just decide, hey, I want to run it on this system or I want to run it on that system.
02:19:46
Speaker
Do you think that's that's going to happen? I mean, not not the day after tomorrow, for sure not. We have so many other problems to solve. um Do you think that will come?

Future of AI Tool Development

02:19:57
Speaker
um Maybe 20 years down the line? Regarding the regulation, I'm not sure. I'm not a big fan of regulations.
02:20:04
Speaker
Yeah. um I think the market should decide it for itself because, I mean, we there's always this kind of negative association with vendor lock-in. But I think vendor lock-in itself, also if you take a look at Apple, it's not a bad thing per se.
02:20:20
Speaker
People usually want to be when they're locked in if the thing that they're locked in is really nice. ah they you know It takes away all of the struggles, it provides you very convenient usage, but then you know yeah you you need to there needs to be a justifiedable justifiable price for that.
02:20:39
Speaker
Yes. And I think this is where things start to get a little bit more complicated Especially in the B2B business. Because in the B2B business, when, for example, your business relies or it's is it's dependent on very good AI and you have you you run your your model workload somewhere in the clouds, you're always looking at ways on how to optimize this.
02:21:03
Speaker
So at some point it the initial phase, it might be perfectly fine to get vendor locked in because the convenience scaling is of very high value to you. But I totally see companies that are um getting beyond a certain scale and they start...
02:21:18
Speaker
to set up their own infrastructure, basically servers, because they know they can save millions of dollars compared to running their workload on the hyperscaler infrastructure.
02:21:31
Speaker
So this is the point, you know, where where some kind of a switching takes place. yeah yeah and um And for example, it's totally fine to set up an an environment where you can run the majority of your workload on your own infrastructure. But when you have spikes, you just rent for a couple of days.
02:21:45
Speaker
Hours or days you can rent compute at at the hyperscalers. um I think...
02:21:57
Speaker
It's hard to say. It's a little bit, for me, it feels like... like um
02:22:05
Speaker
Reading tea leaves. yeah yeah Yeah, yeah, yeah, yeah, yeah. It's almost like reading exactly it's a The thing is, um I don't know. I believe that there will be massive pressure to reduce the margin because we see an increase, a huge increase in the demand of
02:22:30
Speaker
compute for ai workload it's inference or training yeah both right and this usually creates a certain pressure on the market in order to become more efficient yeah and whenever markets become efficient the margin drops yeah and the way it drops is by ah efficient in my opinion an efficient distribution layer because every hyperscaler out there they have zero interest on creating an efficient market yeah of course right yeah so so there's simply no way for example that google or microsoft or amazon will start and build a distribution layer that fairly distributes workload across all the available gpu hardware hardware out there it won't happen no won't happen therefore i i see this is like this is kind of an opportunity right it's a gap um
02:23:26
Speaker
which might or might not work out i believe it will work out because the demand will increase will keep increasing and it will increase to a point where probably you you will you will deal with kind of load spikes that even hyperscalers maybe are not able to handle ad hoc on demand.
02:23:48
Speaker
So you need to fairly distribute it somehow. yeah um And this is, I think, the the way the way that we go there. Regarding the inference... I hope that we will see something similar also for inference. This is not our main target because I think, or the way I think about it is ah you need to specialize. yeah Because I see a certain analogy in software development. In software development, you have this kind of tools, right? You have tools for your IDE, you have tools for testing, you have tools for CICD, you have tools for integration testing, you have tools for A-B b testing, you have tools on...
02:24:22
Speaker
scaling this thing up on on the on the cloud to make it elastic depending on the load that you that you receive. um
02:24:31
Speaker
There were companies that were trying to solve the whole chain and they failed. yeah And I think a very similar thing will happen with AI. yeah So um we you need to try to... or I think tools needs to be focused on certain pain points and problems and they need to solve it very well.
02:24:48
Speaker
And for us, this is the abstraction layer for deploying model workload model model training... on ah on on ah a distributed ah compute hardware.
02:25:00
Speaker
yeah yeah It might be the case that we will be required to even specialize more deeply on very certain aspects on that, um but we will see and and see how to how basically the the user adapts our solution, how they use it, and then we will react to that. Yeah, yeah you're right. I um um also don't necessarily like regulation, especially because the...
02:25:22
Speaker
As you said, the demand most of the time forces the innovation or the reduction in um margin for another company. That's where the competition comes in. um And that's, yeah, just have to make sure that's actually going to happen.
02:25:40
Speaker
that That is quite interesting. I can talk you about this for days. Yeah, probably. But let me just... ah go with two more questions okay okay because that's where we've already been talking for two hours i realized but um and let's start with with
02:26:01
Speaker
i would like your opinion on on the field in the moment what do you think which pressing questions still need to be investigated in the field of AI.
02:26:16
Speaker
And it can be anything, right? Things that might interest you or things where you think, man, this is that's weird. Why does a goal like this? um Can be market, can be product, whatever you think.
02:26:28
Speaker
Yeah, um I'm still struggling struggling to understand why developing AI, like training AI, is still so difficult on distributed environments, right? The thing that we're building, its i mean, for me, it seems so natural.
02:26:43
Speaker
And I'm wondering why there's no solution yet out there that just works the way I imagine it to work. There's maybe one part, a little bit of you know and kind of is self-centric here. But ah regarding the AI itself...
02:26:58
Speaker
I'm really curious to see the boundaries of this new reinforcement learning paradigm. Because I see a huge it's a variety of use cases that could be solved with that, right?
02:27:11
Speaker
So the next thing that... Right now, these models, they get insanely good at coding. Yeah. Because coding is a verifiable task and you can just drop reinforcement learning on it and just let the model figure out stuff until it finds a solution that works.
02:27:26
Speaker
Yeah. the next thing in my opinion is to provide in this chain of thought reasoning chain right now it's just you know it it outputs tokens and it uses these tokens as context for the next chain of thought and the next one and the next one until it arrives at some conclusion where it says okay now i'm thought i i was thinking enough things yeah and just formulate the give it a shot, right, to formulate the the final answer and see if it's correct.
02:28:00
Speaker
And I'm wondering, too or very excited about the fact that you add tools in this chain of thoughts, right? So basically, in in this kind of agentic way on using AI, you not just let it reason over its own output, but also reason over the usage of tools.
02:28:19
Speaker
and the result of these tools, and then to integrate these results into the chain of thought. I mean, this the the the problem space is exploding. yeah And since it's exploding, you need even more compute, even more parallel execution of this stuff.
02:28:36
Speaker
in order to run it at scale and help the model, you know, to converge to something based on a tool set that you provide today This is, I would say, what I'm most excited about, to see the next generation of reinforcement learning on LLMs with tools as being part of the chain of thought.
02:28:58
Speaker
Yeah, damn. And then I would also like to see... how far the use of certain tools generalizes to other ones, right? Because of course you can then use the output of verifiable things to retrain the thing.
02:29:18
Speaker
And, uh, I agree. It, um, I think most people did not grasp what is the explosion this step can it can have. Exactly. Imagine what like you you just drop an AI into an environment. This environment is consisting of, let's say, tasks that are were verifiable. But this environment also ah provides the model with virtual...
02:29:43
Speaker
tools that I can use in order to you know solve the task better or even solve it somehow. yeah And the AI should figure out how to do things there. it was I mean, the idea is not entirely new, right? You can just take a look at reinforcement learning and see how these models are playing these kind of small games, right? And there they get insanely good at it. Or AlphaGo. AlphaGo. yeahp Chess AI, same. so I mean, it was proven that this is a really, really nice thing to do in order to push the boundaries of the capabilities of your AI.
02:30:16
Speaker
And combining this in with LLMs, I think it might be really nice. Yeah.

Conclusion and Final Thoughts

02:30:25
Speaker
it's I'm also curious. That's super interesting. I'm curious where that will go.
02:30:30
Speaker
All right. Last question. Where can people find you if they want to reach out to you? if they think this is super crazy, if they think yeah you're wrong and they will want to give you their reasoning um or want to give you tips or say, hey, I have a data center.
02:30:47
Speaker
Maybe we can we can be hooked up. We don't have an API yet, or maybe we do. Where can they find you? Where can they reach out to you? Yeah, we have a web page that is set up, excelsius.ai.
02:31:01
Speaker
um This is so basically just dropping us a message there and we are replying very, very quickly. um That's great. That's one thing. i mean, also LinkedIn, just Alexander Acker at LinkedIn and then send me a connection request and we can chat about ai or infrastructure or whatever.
02:31:21
Speaker
it's I'm really open to to connect to engineers or interested people out there. Yeah, for sure. It's always nice too to chat. I mean, that's the way we got in touch, right? Yeah, yeah, definitely. So I will put all of this also on the the show notes and the website so people don't have to write it down.
02:31:39
Speaker
Well, Alex, I can tell you, thanks thanks so much for your time. I know you're super busy. um Was there anything else that I maybe forgot to mention that you want to maybe tell the listeners or has something on your mind?
02:31:52
Speaker
ah Not at all. I enjoyed it very much. was a great discussion. so I'm sure this won't be the last time, I bet. Yeah, I hope so. It was really nice talking to you. Well, then to everyone listening, have a great day. Yeah.
02:32:05
Speaker
Hey everyone, just one more thing before you go. I hope you enjoyed the show and to stay up to date with future episodes and extra content, you can sign up to the blog and you'll get an email every Friday that provides some fun before you head off for the weekend.
02:32:19
Speaker
Don't worry, it'll be a short email where I share cool things that I have found or what I've been up to. If you want to receive that, just go to ajmal.com, A-D-J-M-A-L.com and you can sign up right there.
02:32:34
Speaker
I hope you enjoy. it