Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Deep Dive: How We Got to GPT (and What's Next) with Vinay Sankarapu (Arya.ai- An Aurionpro Company) image

Deep Dive: How We Got to GPT (and What's Next) with Vinay Sankarapu (Arya.ai- An Aurionpro Company)

Founder Thesis
Avatar
0 Plays3 seconds ago

"When people think of AI, they think it's probably happened in the last 10 years or 20 years, but it's a journey of 70 plus years."  

This quote from Vinay Sankarapu challenges the common perception of AI as a recent phenomenon.

Vinay Sankarapu is the Founder & CEO of Arya.ai- An Aurionpro Company, one of India's pioneering AI companies established in 2013. An IIT Bombay alumnus, Vinay led Arya.ai to become a profitable (EBITDA positive for 3+ years) enterprise AI player focusing on the BFSI sector, achieving significant scale (~₹50-100 Cr revenue range) before its acquisition by Aurionpro Solutions. He was named in Forbes 30 Under 30 (Asia) and is now also leading AryaXAI, focusing on making AI interpretable and safe for mission-critical applications.  

Key Insights from the Conversation:

👉AI's True Timeline: Unpacking the 70+ year evolution from Alan Turing to modern LLMs.

👉Core Concepts Demystified: Neural Networks, Deep Learning, Backpropagation, CNNs, RNNs, Transformers explained simply.

👉The Deep Tech Journey: The challenges and pivots involved in building an early AI startup in India. 

👉Explainability is Key: Why making AI understandable (XAI) is critical for trust and adoption, especially in regulated industries. 

👉Future of AI: Insights into AI agents, verticalization moats, responsible AI governance, and the changing SaaS landscape.

#AI #ArtificialIntelligence #MachineLearning #DeepLearning #AIHistory #TechHistory #NeuralNetworks #LLMs #Transformers #ExplainableAI #XAI #ResponsibleAI #AIEthics #AIGovernance #StartupIndia #IndianStartups #Entrepreneurship #FounderThesis #DeepTech #BFSI #FintechAI #podcast   

Disclaimer: The views expressed are those of the speaker, not necessarily the channel.


Recommended
Transcript

AI Origins and Theories

00:00:00
Speaker
ah People think when they think of AI, it's probably happened in the last 10 years or 20 years, but it's a journey of almost 70 plus years, right? Then came 2012 AlexNet paper.
00:00:14
Speaker
This was the first time a GPU was used. So for all the guys who invested in Nvidia stock, this is your origin story. You have so much of information that there is no experts anymore.
00:00:27
Speaker
This layer of as micro SaaS play you will not be there anymore. But this SaaS is definitely dead. So let's start with an oral history of AI.
00:00:38
Speaker
Who coined this term? How did this whole term of artificial intelligence come about? What's been the evolution of AI?
00:00:46
Speaker
let's start with and oral history of ai you know who coined this term how did this whole term of artificial intelligence come about what's been the evolution Yeah, yeah. I mean, people think when they think of AI, it's probably happened in the last 10 years or 20 years, but it's a journey of almost 70 plus years, right?
00:01:07
Speaker
ah The first initial evolution of intelligences as early as or even before evolution of computers. And that's the 1940s time when Arlen Turing was working around building machines to break Enigma.
00:01:26
Speaker
If you have seen the inma the imitation game, that exactly is the origin of AI at that point of time. So the the the the thesis behind that approach is, okay, fine, you have a bunch of crypto cryptic messages flowing between systems, which is Enigma.
00:01:45
Speaker
And then you have multiple versions of Enigma as well. the The Nazis had this very famous... so encryption device called the Enigma machine ah through which they would plan ah World War II, where they are attacking, where they are deploying their submarines, etc. And for the allies, it was important to crack the code of the Enigma machines and Alan Turing was the one who cracked the code. So just a little bit of background for the listeners. Yeah.
00:02:11
Speaker
So please continue. Absolutely. So that was the background, right? So finds that they wanted to scale the ah the the and encryption or decryption of these major messages at scale because there were so many messages happening and there were so many so much more of configurations around Enigma.
00:02:28
Speaker
Manually, they could be able to scale up. So that's when Turing came up with concepts of can machines think, can machines solve. That is the fundamental theory of AI at that point of time, right? Yeah.
00:02:40
Speaker
I think of Post followed up with that, with ah the the the mission is called BOMB, B-O-M-B-E or something

Challenges and Breakthroughs in AI

00:02:48
Speaker
of that sort. I forgot the exact spelling of the name that Turing has created. I mean, if you watch the movie, it's the rotor rotators that move, tut, tut.
00:02:57
Speaker
and and try to decrypt MSH or a setting. That's exactly is the first computer you know that that people have ever ah created. ah Post that, it became post-World War era. So this this got into the mathematics. Turing also ah created the Turing test, right? Correct.
00:03:17
Speaker
A lot of time used to be the gold standard for AI. Of course, now very easily even a LLM or a basic LLM can pass the Turing test. But the Turing test was essentially that if you are talking a to somebody whom you don't know and you cannot distinguish whether you're talking to a machine or a human being, that is the Turing test.
00:03:42
Speaker
Correct. Again, so all of this came during that period, right? So World War has been done. Now people needed to expand this theory. People have heard about this theory that, you know, the small community in Bletchley Park, which is the ah which is the MI6 slash ah cryptographers ah university at that point of time, which is responsible for decrypting messages across world war.
00:04:08
Speaker
ah That news got spreaded out and said, okay, fine, let's expand and double click on that theory. People started gathering as as a small community to try to discuss and evolve this further.
00:04:19
Speaker
So this was this was then the AI as a term has kind of coined nineteen fifty s to say, fine, so can we give intelligence to the machines? If so, what is the approach? what How do we test it? That's when Turing came up with the Turing test as one of the methods to test the intelligence. So that's that that actually marks the beginning of AI.
00:04:43
Speaker
In fact, i would ah I would actually mark this era itself as the theory of AI because ah a lot of fundamentals on how or what we are doing today. it's the The theory of those were already discussed and math of it is largely discussed at that point of time. In the last 70 years, it's primarily our learning is in terms of how well to implement a concept called deep learning. So that's that's that's that's what the evolution is.
00:05:12
Speaker
But the initiation happened around the ah In fact, when it started, the approach was the same. So fundamentally, what people would think is, okay, how does a brain work? ah Can I replicate a similar learning behavior into machines?
00:05:26
Speaker
ah That was the initial approach. That was the background for artificial neural networks, right? So that was a time in 1949, 50, I think, um where neural networks as a concept ah was introduced, saying that, okay, here is here is how a neural network should be.
00:05:45
Speaker
And around nineteen fifty s people started defining a mathematical function to this called perceptron. ah So perceptron is, you know, one of the, when you combine these neural networks into layers, that becomes a neural network.
00:05:58
Speaker
ah So people started building a small scale ah neural network machines using using a ZOR kind of circuit maps where you have on and off corner off kind of learning machines ah to do a simple task. So this was how they started doing it.
00:06:18
Speaker
would This was way early. yeah What is neural network? like Can you go a little zoom in a little on that? ah sure So but if you look at ah any biological brain, right, your brain is made of a bunch of neurons. Neurons is like a learning function inside a biological brain.
00:06:37
Speaker
The learning is interaction between these neurons, right, which is where you have the electric circuit triggering between the neurons. that becomes the learning in our brain, that stores in our brain in some places.
00:06:49
Speaker
and and And incidentally, what happens is in in the brain, there are a lot of localizations, right? Like a certain part of the brain is responsible for vision-related task, certain part of the brain is responsible for hearing-related reasoning and all that stuff.
00:07:07
Speaker
So this is how a biological brain is made of, right? ah So that's the early and inspiration when people thought, let's build AI, because they thought, you know, let's replicate biological bodies and build that into machines.
00:07:20
Speaker
This is a time when there was no there was no scaled computer, right? So 1950s is where you didn't even had a computer. So they were fundamentally thinking of a math or a biologically inspired math and try to create a computer out of it.
00:07:35
Speaker
um So thereby, they were introducing a lot of concepts to understand how neural network works. Like the perceptron is one bit of evolution there.
00:07:46
Speaker
And then you have something more important called backpropagation. Backpropagation is the learning optimization, meaning in your neural network. When you pass the information forward, your backpropagation actually adjusts and corrects the learning inside the network.
00:08:03
Speaker
So the theory of about propagation was introduced at that time. But again, most of these are theories and are very early stages. People have discussed them. ah There was a bunch of frontier, you know, a theory without limitations.
00:08:19
Speaker
ah when they started implementing it, that's when the limitations came into play.

AI Commercialization and Democratization

00:08:24
Speaker
ah Primary limitations is the compute itself. So as I said, it's it's not even an error of a consumer computing. Forget about high-scale computing at that point of time.
00:08:34
Speaker
ah People try to implement these like these theories and logics in small scale, small ah capable machines. oh The theory promises a lot of ah a lot of vision, right?
00:08:48
Speaker
Saying that, you know, if we do this, then we can achieve this, achieve that. ah This has, again, spiraled a lot because of early successes of Bambi, ah the Turing machine.
00:09:00
Speaker
ah People thought, okay, they can achieve similar similar kind of successes and multiple problem statements, um but it was not able to scale as well as what people have thought at that time.
00:09:11
Speaker
um So there were a bunch of early stage experiments as well. you know So there was a small robot built. There was a small, ah fully autonomous robot as well. But again, these are like a very lab kind of tools.
00:09:25
Speaker
ah And most of this research is primarily funded by public organizations. ah Like early DARPA was actually focusing around evolution of AI, for example.
00:09:38
Speaker
UK government was investing a lot around AI. US Government Defense ah Advanced Research Project, something like that is the full form. It's ah ah US military subsidiary or or a part of the US military.
00:09:51
Speaker
Correct. ah So this is largely funded by public. um So if you look at you know the the trend line, right? So you had an early success during the era of, ah during the wars.
00:10:03
Speaker
ah Now they want to commercialize or create and scale up those concepts. There was a reasonable funding for the next 20 odd years 1940s to 1960s, as well.
00:10:15
Speaker
There is a bunch of reasonable funding ah backing the idea of neural networks primarily, largely neural networks as the idea to build ai ah But then in 1970s is the first AI winter.
00:10:29
Speaker
um There was this report called ah Light Hill Report, which is published by yeah UK at that time, which evaluated and looked at all these projects. And they said this is not at all scalable. The promise of i know building AI is scalable.
00:10:44
Speaker
Absolutely not reachable. it's All of this is pretty much theory. it It was very harsh report on a bunch of AI projects at that point of time. So this is the beginning of the first AI winter, right?
00:10:56
Speaker
ah So the theory promise was so much so that there is a bunch of funding and then it kind of, you know, checked. by the reality when they started implementing it.
00:11:07
Speaker
um And as it is largely backed by public funding, so they are answerable to, you know, eventually the public and the government. ah So they just simply stopped funding ah many of these ideas.
00:11:20
Speaker
ah So this this just kind of stopped a ah lot of experimentation that was happening at that time, right? So this winter was almost for next, almost a decade, almost a decade, there was,
00:11:32
Speaker
But again, so I think the fascination to build intelligence is pretty much you know so ah so much so. ah People have seen 2001 Space Odyssey during the same time, 1960s,
00:11:46
Speaker
um you had you know You had so much of fascination around sci-fi. ah But nonetheless, I think we have seen the second set of AI wave alongside with commercial slash evolution of personal computing, right?
00:12:04
Speaker
um The 1980s, 75, 80s, when Apple, IBMs were focusing on personal computing, ah where people started looking at, fine, so instead of building AI-specific computing, let's focus on maybe personal-slash-consumer-focused compute.
00:12:22
Speaker
So 1980 is when, again, there is a large amount of focus around compute. People started building computers. Now experiments like this didn't require a large amount of funding, right?
00:12:34
Speaker
ah This marked a lot of democratization in terms of AI research. It was earlier largely focused by certain universities backed by a lot of backing from the government.
00:12:45
Speaker
ah But with ah with with all the compute being available to some extent, little bit cheaper as well, as compared to around 1960s and 70s, there was a lot of democratization around experimentation. So people started trying out multiple experiments, multiple theories and approaches.
00:13:03
Speaker
So this is this is a second way of AI again. So the early success was the export systems. know This is the early AI commercialization as well.
00:13:17
Speaker
So far, we have not seen so much of implementation of AI in large scale commercial aspects. But during 1980s, there is expert systems.
00:13:28
Speaker
Expert systems is instead of building learning systems, you give a bunch of rules and build expert systems for a specific process. oh I think there is a system called XCon, which was built during that time.
00:13:44
Speaker
It was largely trying to package a bunch of orders during the e-commerce orders game. So again, because of the boom of ah computers, people were ordering a lot of stuff.
00:13:58
Speaker
So a ability to package and create a right combination of these is is also a problem to achieve that scale. ah So people build this export systems and solve the problem at scale. They were able to scale up a lot of, um you know, ah you can call it as a minimal automation.
00:14:16
Speaker
They were able to automate a bunch of things. But this export systems as a theory and an approach people even use today. right So even today, for example, if you look at in places like financial services, for example,
00:14:31
Speaker
ah you have You have a rule-based ah ah fraud systems or rule-based underwriting systems, which are necessarily export systems. They identify bunch of rules or guidelines that is required to accept a transaction and they validate the transaction on these guidelines or rules, for example.
00:14:54
Speaker
if if as long as the transaction satisfies it, it goes through as a strikethrough, right? So that's where you have a bunch of rules being used as undetting system today. So yeah, I think the evolution of expert system as a concept is quite used even today's era as well.
00:15:14
Speaker
what is the What is the difference between an expert system and an algorithm?
00:15:20
Speaker
In export system, you define the contours manually right for each and every step, or you can so you can you can you can think of, the like ah let's say, a decision tree kind of process. right ah you oh we We are double-clicking, but let me say that in an approach point of view. So let's say you have a decisioning system, an export system.
00:15:43
Speaker
ah to underwrite a transaction. So you follow a setup method saying if this, then that, if this, then that. You have bunch of IFTT rules, right? These rules, people typically define it manually.
00:15:54
Speaker
For example, let's say your FICO score is less than 700. So typically if it's greater than 700, I'll go to a next set of rule. Now it is less than 700. So manually, I'll have to identify if it is less than 700, what is next set of risk criteria I have to check before I go to the next step.
00:16:13
Speaker
So these contours or nodes, an expert or a human will identify manually and define these rules. Sometimes these can be a little bit abstract. That's what happened in the recent times, let's say last ah couple of decades in financial services.
00:16:30
Speaker
So instead of simply defining that as a if-then rule, it could be ah ah an observation from analytics, for example. But still, even then it is manually defined. right This is a rule-based expert system.
00:16:43
Speaker
Algorithm is, I don't need to define this manually. So can I have a system to do this automatically in a manner that I can achieve the same set of output? right The same export system replica is something like an know gradient boost tree-based models.
00:17:01
Speaker
Like you have Exiboost model, which is the most used classic ML technique across different use cases. ah In Exiboost, instead of defining this addition tree manually, so the algorithm tries to identify where should I segment the data and create that next set of tree functions so that I can achieve better prediction with a lesser amount of error rate.
00:17:26
Speaker
So Allgo does a similar thing in more ah fully fully automated way, whereas in expert systems, you are trying to define everything manually. So this is a big difference. Now, this is what this is what the problem is as well, right? So at that time in 1980s, the Allgo is not that much matured.
00:17:41
Speaker
People thought expert system is the way to achieve this. They said, okay, fine. So why don't I define this expert system for every given process in the entire world? But you can't define that manually. It's it's it's very hard to scale it up.
00:17:54
Speaker
um But again, this was the early 90s where now there is a revised amount of investment and interest post the AI winter in mid 1970s and early 1970s. People started investing again in AI.
00:18:10
Speaker
people started investing again in ai ah But now it's not just driven by neural networks. right So there are now two classes of ah you know ah experimentation.
00:18:22
Speaker
And in fact, a major amount of experimentation was actually around the second class, which is the expert-based systems, not on the neural networks. ah So neural networks kind of became a very much a taboo at that point of time.
00:18:36
Speaker
So people thought, you know, these theories doesn't scale up ah from a neural network standpoint, because there are a lot of intrinsic limitations to implement neural networks. So they started investing around.
00:18:48
Speaker
Yeah. What is the... what is the IT, manifestation of a neural network. So you told me neural network, like the brain has neurons which talk to each other and then there is localization. Like say your speech is in one area of the brain, all the neurons in that area specialize in speech, how how is the How is this manifested in the software world? Is it like a bunch of chips representing neurons? or like what Yeah. like No, no. it's It's much fundamental, actually. So as i said, 1960s or fifty s is when the definition of perceptron came into picture, right?
00:19:23
Speaker
Perceptron is one neuron. One neuron is perceptron, and you combine multiple perceptrons, it becomes a neural network. And what does the perceptron do? Perceptron is are very much a simple linear function.
00:19:34
Speaker
You can make it as as simple as f of x is equal to sigma of f of y plus bias b. ah So weight weight is the weight and then you have the bias. So let's say you have one, two, three, four neurons and one, two neurons next, right?
00:19:50
Speaker
So how much weight you allocate to this neuron and how much bias you can allocat allocate. That weight and bias allocation is happened over time when a certain you know a certain data flows from one neuron to another neuron ah like that.
00:20:06
Speaker
It's a simple math trying to optimize an end function where you have a weight at each node and bias at each node. ah so ah yeah A perceptron is just like a...
00:20:18
Speaker
Mathematical function. There's some data in and data out. And a bunch of perceptrons together is a neural network.

Neural Networks and Image Classification

00:20:28
Speaker
ah It's not just a bunch of perceptrons together, but also those perceptrons are given weights. Like some perceptrons are more important for some work.
00:20:36
Speaker
ah Yeah. And that allocation of weight is what the algorithm needs to do. right okay What weight need to give to which neuron at what point of time. So this is exactly what happened in the last 70 years, right? We know that ah and um and in a retrospective, um we always knew that you know deep learning is many neural network based systems easy approach, but we never knew how to implement it properly.
00:21:01
Speaker
So in the last 70 years, that was has happened. Like early, there was a theory and then we had introduced perceptron in 1950s. And then backpropagation. Backpropagation is, you know, so how does the ah data flows forward and error flows flows backwards so that these weights will get corrected and thereby the model starts learning. The la learning is nothing but you act and react, right? And you do it a little differently next time.
00:21:29
Speaker
That's what the learning is. Backpropagation feedback, basically. like but Exactly. Backpropagation is feedback. kind that that This was a wrong answer. So let me go back and see which perceptron was responsible for it. so Yes. And adjust the weights and everything.
00:21:43
Speaker
Okay. Yes. okay So, you know, this has happened in 1950s, right? So this is like almost 60, 70 years old theory. oh So ah we so that that's what I was saying. I think in early AI era, it was largely a theory around neural networks.
00:21:59
Speaker
And then we hit a a wall because of a lot of things, ah which is primarily around compute was one big problem and second was the data. These were the two main problems.
00:22:10
Speaker
This was the 1970s, right? So then people thought, why invest only on this approach? There is another approach, which is expert-based systems. That had an early success in 1980s. Why not invest in that and see how well it can get scaled, right? Right.
00:22:26
Speaker
ah So the the theory or investment around neural networks was very much ah reduced in 1980s because of what happened in the um before 30 years.
00:22:36
Speaker
It became very much a taboo. And very, very few people or ah ah you know continued the research around the neural networks, which were called ah the bonkers of the academia at that point. Yeah.
00:22:50
Speaker
ah So, ah yeah, but ah during that time, a lot of fundamental concepts were again rechecked and introduced. ah Things like, ah for for example, you had Jan Lakhon working in 1980s who is doing work around CNNs.
00:23:10
Speaker
ah So he introduced, he's yeah even though backpropagation was introduced early on, it was not fully implemented in a neural network approach until nineteen eighty s So in early 1980s, there was a Japanese researcher, ah yeah Fukushima, I think, if I'm not wrong.
00:23:28
Speaker
He has implemented backpropagation neural network in early 1980s, which was extended in late by Jan Lokun. byanne lacon um by stacking these layers called CNNs, convolution neural networks.
00:23:43
Speaker
It's a new class of neural networks to do image classification task. This is on digits recognition. um Like in early 90s, if you look at the check processing being done in US, to some extent was automated because of this idea.
00:23:59
Speaker
ah so This OCR, optical character recognition technology, this OCR technology was based on this CNN ah technology. Yeah. So CNN, to some extent, and there was also classic, as I said, the export systems, right? export systems ah People expanded that export systems to classic ML in late 1990s. Can you zoom a little bit on CNN? What is this convoluted neural network you said is the full form? Yes, convolutional neural networks.
00:24:28
Speaker
Convolutional. Okay. Yes. So and know so in neural networks, that's the idea, right? So if if if I combine these neurons in in a different kind of manner, I will have a different kind of learning behavior, right?
00:24:43
Speaker
For example, if I stack them one after the other ah continuously, that's a recurrent neural network. This is again you introduced during the same time itself, during nineteen eighty s or...
00:24:55
Speaker
I'll stack one next to another and I'll propagate the learning from one block to another block, another block, another block, and then I'll try to log. This is one type of network. and Then in convolutional network, you typically this is used for images where you have a three-dimensional matrix, which is RGB. That's what an image is defined to. You have a bunch of, and pixels is is pretty much an each node in this point.
00:25:24
Speaker
ah So you have, let's say, you know, so this much amount of depth in you have 1D or 2D, typically 2D kind of convolutional neural network. You have one that one layer starting at your entire image kind of dimensions.
00:25:38
Speaker
Then you will have another block of neural network, which is less a number of dimensions. since that the learning will start isolating to learning certain kind of behaviors. So you stack them up like that in a certain order and you try to create a learning function or optimization function between these layers so that the learning happens from one to one another.
00:26:00
Speaker
I'm talking about a very ah very advanced concept, but at that point of time, all of this was maybe less than 10 by 10, you know, four by four kind of and neurons at that time.
00:26:11
Speaker
ah So to scale it up um is is where people had a bunch of challenges. Meaning i in early stages, if I scale to these many neurons, if I don't know how to do, let's say, a gradient optimization, then eventually it will not learn anything. I simply ah have a bunch of neurons which doesn't learn anything and it will give me a back wrong result.
00:26:33
Speaker
So this is where the propagation was one idea. and then there is things like gradient boosting was another one. Sorry, gradient ah implementation was another one, which was introduced by Hilton in later on in 2000, which is around DBMs.
00:26:49
Speaker
ah so ah But again, so coming back to taking a step back, if you look at 1980s to nineteen ninety s once one class of thought is around export systems where there was an early success and then there was a burst.
00:27:03
Speaker
Again, export systems were also not scalable. Neural networks was not scalable, so but neural networks didn't have any kind of pressure. People who who didn't had you know who are very strong beliefs, only they started investing around neural networks.
00:27:20
Speaker
ah The next phase is what the important one, which is the 1990s till 2010s, for example. For almost next 20 years, there was slow innovation happened one after the other, one after the other, one after the other, which is primarily around classic ML, which is primarily around things like you started having algorithms like SVM support vector machines.
00:27:46
Speaker
You have random forest models, classic ML techniques evolving. This is because of of both both evolution of data and computers. right ah So computers became mainstream.
00:28:01
Speaker
um There are a bunch of computers. There is a huge amount of data. Dotcom kind of picked up a lot. All of this gave um a little bit more resources than what they were before nineteen ninety s for example.
00:28:17
Speaker
ah And these classic ML systems started seeing a bunch of successes, right? So success after the other, success after the other, in a very small scale. ah people Primarily in the financial services areas, for example, people started building things like the export systems in fraud, the export systems in underwriting.
00:28:37
Speaker
ah In financial services, this was scaled very well. And then statistical modeling, people started implementing that in all-go trading since 2000. since two thousand two thousand And even today, all-go trading is now a big, big area, right? Yeah.
00:28:53
Speaker
oh On the consumer side of things, you had things like search engines, ah which again uses these basic fundamental classic machine learning concepts.
00:29:04
Speaker
um yeah ah so But it it was not actually what happened from 1990s also is that AI was not looked at one department kind of ah approach.
00:29:16
Speaker
ah So people started diversifying the approaches ah largely specific to data sets as well. you know Like for image data people, there is OCR, which picked up a lot ah from 2000s as well.
00:29:29
Speaker
Even today, OCR is the biggest thing ah from a computer vision standpoint. Enterprise implementation of computer computer vision, for example. And then you had a search engine, which which actually reinforced a lot of investment around NLP.
00:29:45
Speaker
ah Because where how do you type in the keyword, how well you can reli retrieve the information, which is what the main focus was ah for people like Google in 2000s, for example, how fast they can do, how how accurately they can retrieve the information.
00:30:02
Speaker
So that gave an introduction to a bunch of NLP-related problems. ah Because now it's not just tabular data, right? So until 1990s, 2000 also, it's largely tabular data heavy kind of work.
00:30:18
Speaker
Because early adoption of computers was around tabular data sets, right? Like your classic accounting, for example, stock markets. It's a tabular data heavy industry to 2000s.
00:30:31
Speaker
Compute division was to some extent because that was the early digitalization era People started scanning things, storing images as files. So it was an early compute division related task.
00:30:45
Speaker
ah um 2000s is what the NLP era was, the beginning of NLP era, because of the search engine evolution, for example. um So net-net, you know, until um until the last, the recent wave of 2010s, until that point, which is 1990s 2020,
00:31:04
Speaker
ah sorry, 2010, it was an early era of slow innovation. A lot of things started happening very slowly, very systematically, without a large amount of hype around AI.
00:31:17
Speaker
right There was no hype wave ah for the almost 20 years between 1990s to 2010s, which actually was helpful because ah there was no overflow of funding, there was no overflow of experiments.
00:31:32
Speaker
The experiments were driven by organic research one after the other. ah Particularly for deep learning, this is actually very important fundamentally because even though the concepts were introduced in 1950s, 60s, people started implementing those concepts. Like as I said, backpropagation 1980, example.
00:31:53
Speaker
ah the the early introduction of CNNs. And then the CNNs were scaled into a little bit large scale again by Anne Lacoon's team in 1998 as part of the Bell Labs research lab.
00:32:06
Speaker
ah They again scaled the computer vision problem around image classification at that time. ah and And that's kind of slowly, you know, also because the image data is kind of populating.
00:32:20
Speaker
ah Now, if you see the evolution of data population, unstructured data like images is getting bigger, you know, the quantum of data. Text is getting bigger. Tabular data largely there, but there is a bit of monetization happening on tabular data because of the export systems of Classic ML.
00:32:39
Speaker
Whereas the CV, the computer vision part and NLP is very underserved. right We had more data than are the monetization capability around it.
00:32:50
Speaker
there was search engine for for NLP. Images, there was largely nothing except OCR. ah This is what kind of triggered many people to say that, okay, fine, you have more amount of image data, ah more amount of text data.
00:33:06
Speaker
So how should we start ah looking at this? ah and And this looked like a very good scale problem, right? Because you had good enough data, good amount of diversity, and there is reasonable amount of commercial interest around these areas.

Impact of ImageNet and AlexNet

00:33:20
Speaker
So this is when you know two thousand early 2005 or ah so the ImageNet data as a project started in 2006, if I'm not wrong.
00:33:32
Speaker
It later completed in 2009. ah So Feifei is one of the well-known researchers in AI from Stanford, if I'm not wrong, when this project was started.
00:33:45
Speaker
So the idea is, fine, if you want to solve the computer vision problem, you need a good data where you can build models, experiment, and create a benchmarking.
00:33:57
Speaker
Yes. Create a benchmarking, right? You need a very strong benchmarking matrix to be able to scale the algorithms. ah So for images, there was not ah the dataset was pretty much non-existent.
00:34:10
Speaker
ah So this ah this lab, Fei-Fei took that initiative. They started labeling 1,000 classes, 1,000 classes as in 1,000 different concepts.
00:34:21
Speaker
Like in those thousand of different concepts, you had elephants, flowers, objects, a lot of things. So the idea is they had somebody manually or to some extent augmented, very rude augmentation.
00:34:36
Speaker
They look at an image and they label it as, okay, this is elephant. ah They did that ah for almost a million plus data, a thousand classes, million plus data. Look at the efforts. right So if this data was not existent, I don't think we would have had this much amount of research around neural networks at that point of time.
00:35:00
Speaker
um So, ImageNet data got introduced. or People started experimenting the algorithms around ImageNet data. Now, this is like a you know classic movie story, right?
00:35:12
Speaker
ah So, we had initial hero introduction, which was in nineteen fifty s ah where neural networks was the way to solve it. ah Then the classic ah problem in a movie, which is, you know, your theory is great, but implementation is poor.
00:35:29
Speaker
It's because the world never caught up with it. You know, you but you had certain things which the world didn't had. But a lot of a lot of self-skilling was happening, right? Like how to implement neural networks, how to do it effectively, how to do backpropagation,
00:35:45
Speaker
What are the different network architectures that you can come up with like RNN, CNNs, RNN? All of them are introduced in 1990s, early 1990s, for example. ah Now, the theory was there to some extent. The implementation is getting there.
00:35:59
Speaker
Now we need a problem statement to say now we can prove and say whether we are successful or not, right? ImageNet was that data set where people started benchmarking different approaches.
00:36:11
Speaker
Then came 2012 AlexNet paper. but This is the big... well but One question. said How is ImageNet a problem statement? Like, people would...
00:36:23
Speaker
run their algorithm on the images to see if their labels match the human labels? Was that the problem? Yes. ImageNet competition is it's a classic competition. right so Annually, they will they will look at all submissions and rank which submission has scored highest in this performance.
00:36:42
Speaker
So you had human labels. Now you benchmark that with your machine predicted labels and see whether mission predicted accurately as compared to human labels or not. ah until Till 2009, 10, 11 as well, the accuracy or the error rate were around 25%, 30%.
00:37:00
Speaker
So this is, like again, they were doing things like manually defining certain things. um know This is again to some extent like you know an expert kind of system. If it has two eyes and four legs, then yeah it's animal. Exactly. Something like that.
00:37:15
Speaker
Okay. We used use techniques like edge detection first. So you have you use some techniques to do edge detection. And then post but edge detection, you will try to identify multiple things like what is a color, maximum of color.
00:37:31
Speaker
ah you do You write different ah you know different things for different ah aspects of it, right? is what I said. ah you know If there's then that, but approach was statistically driven.
00:37:42
Speaker
Edge detection done by some statistical model. And then you go to the next step and then you go to the next step, you summarize it and then say, okay, this probably could be an elephant. This probably could be something else. So till 2011, the error rate is around 30%, 25, 30%.
00:37:57
Speaker
So which means it's not it's okay. It's not great, but it's not at all reliable, right? Accuracy is only 70, 75%. 25, 30% error rate is not a great Then came the AlexNet paper. The error rate suddenly drops to 15% at that point of time. 15, 14% or 86, 88% if I'm not wrong, ah the accuracy was.
00:38:17
Speaker
oh But the interesting thing is the approach. ah This was the first time a GPU was used. So for all the guys who invested in NVIDIA stock, this is your origin story.
00:38:29
Speaker
ah right so and also For the first time, a GPU was used to implement a neural network, applying all the concepts ah that were introduced till that point and implemented on a large scale data and a reasonably good enough compute.
00:38:47
Speaker
This GPU was 580X, if I'm not wrong, 580. This had nodes. hardly around thousand or five thousand compute no compute nodes.
00:39:01
Speaker
ah Maybe less than that, if I'm not wrong. So maybe 580 or something like that. Less than 1,000, I'll have to cross-check that number, but very less number of GPU nodes, right? ah Even NVIDIA was not looking at this as a use case area at that point of time. ah the The thesis is, okay, neural networks need a lot of parallelization. So if we can parallelize this on a reasonably large enough compute, we can have little bit larger model and and make the model learn faster.
00:39:29
Speaker
This was a problem in CPUs. If I increase the number of layers and if i add more number of nodes, it will take inevitably very, very, very, very long time. ah it was it It was not scalable. So people thought, okay, that's why neural networks will never be scalable.
00:39:46
Speaker
ah because you can't do this on a CPU of 16 cores CPU. At that time, it will probably be a eight core CPU with a clock speed of less than 1500 clock speed, which is very, very small clock speed and a small CPU. So you can't have a model built on this.
00:40:03
Speaker
So 2012 is when people like Alex, Ilya, and Professor Hilton. Hilton was one of those guys who had been investing around neural networks parallelly as much as Jan Lokone, for example. For a very, very long time, there are very few set of professors ah who were who are in deep learning and continue to invest in deep learning.
00:40:25
Speaker
Hilton introduced a bunch of new ideas as well in 2006. ah There was a paper around DBMs, Deep Belief Networks.
00:40:36
Speaker
ah So it was a different kind of way of training a neural network model. So AlexNet introduced in 2012, where accuracy suddenly jumped. That triggered a lot of attention in the academia. but One question i have.
00:40:52
Speaker
AlexNet ditched the expert approach. Yes, it's not an expert approach, right? I'm i'm simply built very large neural network. ah In fact, this is this is the interesting idea as well.
00:41:03
Speaker
ah So they built ah close to seven CNN plus one ah fully connected network, ah close to six, seven, eight seven eight odd layered neural network.
00:41:15
Speaker
ah They have not defined what to do at each layer manually. There is no manual definition of anything of sorts, right? So I simply pass my image and I correct the error in my final layer looking at the label.
00:41:29
Speaker
Project the label and actual label and propagate the error back to my neural network. when they trained this model at a scale of the ImageNet dataset, dataset and this model was able to scale with a large enough data. right so Then when they started looking at what's happening at these ah layers by looking at the attention maps, they learned that each layer is doing a certain job automatically by ah even though we are not defining it manually.
00:41:59
Speaker
For example, certain layer is responsible for learning more around the edges. Certain layer is responsible for learning the human objects. Certain layer is responsible for understanding you know the color textures, for example.
00:42:12
Speaker
So whatever we used to do manually through expert-based systems, now if you scale the model large enough, it was able to do that automatically. ah This was the trigger for everyone, right? So people said, fine, now you can build a very large neural network and you don't need to define this manually.
00:42:29
Speaker
And you can apply it to a reasonable data as long as the data size is there. ah So this is the this is the most probably the beginning era of new age AI, um new age startups as well.
00:42:42
Speaker
So 2010, 2012 when, yeah. twenty ten twelve is well yeah So, there was ah in the 90s and 2000s, there was this Deep Blue, IBM Deep Blue, which beat Garry Kasparov, the world chess champion at that time.
00:42:56
Speaker
yes Was that an expert system or was it a neural network? No, that's an expert system as well. In chess, right ah the number of strategies are pretty much limited, or at least ah the number of strategies required to beat a human is very limited.

Expert Systems and NLP Evolution

00:43:12
Speaker
So it's more of a search function. So you have multiple strategies and you try to apply that search function depending upon the strategy that is currently implied upon. ah So this is again 1997, 98, Deep Blue was there.
00:43:26
Speaker
And post that, ah ah and and there was an intermediary bunch of news as well, right? You had Watson winning the Jeopardy game in 2010s. I believe if it's it's around 29, 2010s as well, around the same time.
00:43:43
Speaker
That's an NLP problem, right? You know, this is what I'm saying. 2000s to 2010s, a lot of attention was around and NLP because of the search. Huge amount of data is now indexed, which is the web. ah People are trying to use that web. There was Wikipedia came out ah for Jopardy.
00:44:03
Speaker
ah Wikipedia dataset is a major required dataset, right? The approach that IBM took at that time is something called knowledge graphs. Even then people use today the knowledge graph approach to for certain kind of models, ah for certain kind of systems as well.
00:44:21
Speaker
ah So and the knowledge graph approach is you try to identify a bunch of entities, which is the things like city names, country names, actors, like that.
00:44:35
Speaker
Japoti is primarily around questions around knowledge, right? Like general knowledge between objects like ah ah movie stars, locations, poets, like that. It was primarily around nouns in short, right? ah they created this ah They created this knowledge graph systems using that as as as the approach.
00:44:58
Speaker
And they were able to win the Japoti game. ah Same thing, to win a human export in a game, for example, the number of searches or ah the search function is the it depends on the problem statement, of course.
00:45:12
Speaker
ah First was chess, then was Jabadi. ah Probably the most next complex one is AlphaGo in 2016-70. ah That, again, is a search-based system, but it has a different kind of approach. We'll discuss that.
00:45:26
Speaker
But these two were the main things that happened in that nineteen ninety s to twenty ten s kind of era. ah Knowledge graph is just a system of organizing knowledge, information? like but information um Yes, it's indexing. ah you You try to keep the concepts closer which are related. ah Then you try to create that as ah as ah as a mesh of network such that when you need something, you hopefully see that whichever is closer is the closest to answer that problem, for example.
00:45:57
Speaker
oh So, knowledge graph, ah yeah, so the in in early NLP era, so NLP era, so this knowledge graph used to happen ah using a lot of things. Like this is again the time when we started as well. There is something called ontologies.
00:46:14
Speaker
Ontologies is nothing but dictionary. Okay. ah So for example, ah have an English dictionary, which is keywords of all the English language, right? ah Then if I want to explain certain industry, let's say computer science, for example.
00:46:30
Speaker
In computer science, you have things like artificial neural network as one word, which is a concept, right? ah You have a bunch of concepts. So people started creating 2010s or these onlogies So they had things like medical specific ontology, general knowledge ontology, movie story ontology, for example.
00:46:52
Speaker
So ontologies are the base for creating these knowledge graphs kind of systems. ah So this is again early era where NLP was largely, ah a classic NLP was largely ah through this kind of approach.
00:47:07
Speaker
ah You use things like entity recognition, grammar recognition, and you use those inputs into the algorithm. Grammar recognition, for example, what is a word? If it is a verb, you will keep that word. If it is preprovision, for example, you will remove it because that largely has any impact on the search function.
00:47:29
Speaker
ah and When you are searching or asking something, right ah your noun is the main important. like For example, what is the capital of France? France is a noun. ah Now, next is what is it that you want to know about that noun? Next is the capital.
00:47:43
Speaker
right Capital is an X one. And next is what is it for the capital that you want to ah know about that noun, which is what? You want to know what for the noun France, ah the information that you want as capital.
00:47:56
Speaker
So this is this is the classic NLP approach. ah You can convert any ah text query input into a format like this, and that's how people were doing search. ah and it's it's it's It's not rudimentary, but at that time, it's it's very advanced.
00:48:12
Speaker
um And to to do this relationship, right? So what is the relationship between a capital and a fronds in a given query? Right. people used to do very manual NLP techniques and libraries.
00:48:25
Speaker
I remember that there was a library called Spacey. Even people use up Spacey today. Spacey came in around 2010s and 15s. But at that time, there were a bunch of libraries that we were even using as well.
00:48:38
Speaker
ah Because whenever there is a text information, you have to understand that in some manner. This is what we are we were using it. The reason why I'm explaining this is this is now closely related to what we are doing in LLM's.
00:48:49
Speaker
ah how we are doing in LLMs, right? ah So, you I sorry think you always have an initiation of an idea. There is a spread out of the idea or a maturity of the idea and then the scale of the idea. So, this is what happened in AI ah from time and time and time.
00:49:05
Speaker
What is the difference between a CPU and a GPU? Oh, yeah. ah Okay. So CPU is largely a parallel, sorry, series-based system, right? So you have, let's say, 16-core CPU. ah Your laptop is powered by motherboard. Motherboard requires a CPU to do things like your programming and all that stuff.
00:49:24
Speaker
Most of the series ah functions, like all your classic requirements are a simple series related kind of functions. When you need a lot of parallelization, parallelization as in I want to do a bunch of functions at the same time and aggregate it and do next next set of functions. ali a next do next set of function ah like ah streaming a video, for example, or simulating a game, for example, simulating weather, ah doing a bunch of calculations across multiple data sets. This is a parallelization problem.
00:49:58
Speaker
ah To do parallelization, I can do that in CPU, but I can have a limited number of cores, like maybe 32 cores, 16 cores, depending upon your RAM size and your CPU processor.
00:50:11
Speaker
Whereas for a GPU, you will have more number of cores, right? Like 1080 had close to 1000-odd cores.
00:50:18
Speaker
And today's traditional consumer grade GPU like 4080 or example, well. ah has somewhere between two thousand cores to even four thousand six thousand cores as well So if you increase the number of cores, so this is again the story of NVIDIA as well, right? So one is, okay, let me increase the number of compute cores.
00:50:42
Speaker
Again, this is the same time when NVIDIA has started, right? And from 1990s, the idea is why only have a CPU-based processor? Why can't we have a parallelization kind of processor?
00:50:52
Speaker
ah Their first GPU processor is late 1990. Until then, for almost six, seven years, They were simply doing R&D and product building. they but ah The initial use cases for GPU was gaming because in game, you have to load a bunch of frames ah and frames has a bunch of pixels.
00:51:14
Speaker
ah So instead of loading those pixels one after the other, so they try to load those pixels multiple times panally such that you will have a very very seamless experience when you are playing a game.
00:51:25
Speaker
ah So this is the approach for GPU, so you know where you do a bunch of parallelization. i And NVIDIA was investing a lot on on this because of ah the boom in gaming.
00:51:40
Speaker
AI was never in the picture until 2012 as well. There was early analytics being done on GPUs, but not to this scale. ah So, NVIDIA was primarily trying to support ah the ah the the requirements of a gamer, always that as a preference and a priority until 2020 as well.
00:52:02
Speaker
The gamer was a priority. ah They were they launching a bunch of consumer-grade GPUs and large enterprise-grade GPUs as well, primarily for large-scale processing kind of tasks.
00:52:14
Speaker
Okay. ah You mentioned on AlexNet that they saw the attention map. ah what What does that term mean, attention map?
00:52:24
Speaker
Like you said, there were seven or eight CNNs, and then they looked at the attention map. And again, this is also this word attention is also kind of important in AI, so that's why I want to understand it. Yeah. so ah Yeah, so for each, ah when when I pass a new input to the neural network, right? So I've trained the model, for example, while training, I want to understand what each node is getting activated. Activation maps, in fact, not ah my mistake.
00:52:53
Speaker
So activation maps, of what it does is whenever and information is flowing through that neuron, it sees which neuron is activated to what information input. Okay. ah So for example, let's say i have a pixel and I'm passing that pixel through the neuron. So which neuron is activating for that pixel for what kind of information?
00:53:14
Speaker
ah So when you map map these activations, you would know that which neuron is activated for which kind of input. right ah So let's say for edges, it is activating a lot, which means this neuron is responsible for detecting the edge edges. ah okay this is This is what the explainability comes to. right ah ah um okay This is a new concept in AI, which is explainability.
00:53:39
Speaker
ah okay Model is giving me the classification, but on what basis the model is giving me the classification? ah In expert systems, it's easy to interpret but because I know which rule or which which function is activated. And based on that, I know on that basis, what is the object.
00:53:58
Speaker
ah Like, for example, your credit scoring. Even today, credit scoring is very much an expert-based system. If your credit scoring is getting 10 points better, you know that this is because I have given 10% score to your ah renewal rate. When your renewal rate, meaning you ah how well you are paying your credit card bills, if that is on time, your score is always good. If that is not on time, then your score will get affected by 10 basis points.
00:54:26
Speaker
and The rule triggered is this. You have a very clear traceability going back to ah the exact area or ah input where the model has taken that call.
00:54:38
Speaker
ah In neural networks, there are multiple ways you can understand that. One is using this this activation maps.
00:54:49
Speaker
Activation maps is when you run the input, you observe which neuron is getting getting activated and you highlight those neurons as responsible neurons. right ah Same thing.
00:55:00
Speaker
ah Likewise, in neural networks, you have another method called integrated gradients. It calculates the gradient values and tries to cross-pollinate that again. ah You have a bunch of different ways to do this ah by going through the neural network and trying to understand ah how the neural but neural network has ah you know given you that output.
00:55:22
Speaker
So um summarizing what happened in, let's say, 2000 to 2010s, plus or minus four or five years, ah was a focus around NLP because focus around search became a large amount of attention because you suddenly saw a lot of data being indexed, right?
00:55:40
Speaker
And there is Watson, which demonstrated the value of knowledge graph systems in jeopardy game. And they did a bunch of marketing around wason Watson.
00:55:50
Speaker
Even today, people use Watson for different things. But a lot of that back-end technology has changed in the last few years anyways. So 2012 marked the beginning of the AlexNet paper, right?
00:56:03
Speaker
Now you will see a very ah very much a template, okay? So there is this lab who created an algorithm um which beat the industry criteria.
00:56:14
Speaker
ah And that approach is well appreciated and used for many other use cases as well. ah And and'm followed with that, another lab introduced a new technique and and have beaten the AlexNet outcome in 2013. This was a team from Clarify, if I'm not wrong.
00:56:35
Speaker
So Clarify is a startup. they They exist even today. So 2012, 2013 is when many of these, ah you know, whatever you have seen in the last two years was happened at that time for us when we started.
00:56:48
Speaker
Because in the academic community, deep learning was ah suddenly became adding and and a big focus, ah saying that now we can learn, you know, ah ah we can build large scale neural networks and learn things like image images which was not scalable till that point of time.
00:57:06
Speaker
So many people came with different approaches of ah ah commercializing it or investing around our and This is when people started investing around R&D back again in a scale as much as in 1980s.
00:57:19
Speaker
This was the only time or large enough, I know many people were investing at the same time, large enough amounts with a very focused and very important research on commercial applicability.
00:57:33
Speaker
ah so When you had this algorithm, right so CNNs, for example, scaling for problem statements like images, there are there are two things that is quite important for us to scale to next stage, which is the commercial applicability of it. Commercial validation is always important to cross-invest in the R&D and upscale the R&D, for example.
00:57:57
Speaker
ah While the CNNs were implemented for image-related problems, ah the parallelization and implementing deep learning large scale on GPUs started scaling up.
00:58:08
Speaker
um and And people started applying that for different kinds of problems, not just for images, but also for things like text. I discussed the ah the NLP approach, right? Where you you use things like entity recognition and then do lexicology or ontology based kind of NLP.
00:58:27
Speaker
ah People said, why don't we do a similar NLP task using deep learning, using neural network-based systems? ah This was the work when a few people started working around it, like people like Richard Schaucher.
00:58:42
Speaker
He's now the founder of u.com. ah At that time, his work was one of the good work around nlp Like, as I said, so when somebody asks something, you need to find the relationship between the different components, right? Your noun versus verb.
00:58:58
Speaker
What is the relationship is what you need to identify and you were using a classic NLP techniques. So instead of doing that, all of that using manually. ah Now, if I train a model, in this case, it's RNNs.
00:59:10
Speaker
So CNNs is largely parallel kind of learning problem. RNNs is primarily for series-related problems. like Language is always a series-related problem. right For example... What is the form of RNN here?
00:59:25
Speaker
Recurrent neural networks. Okay. Recurrent means what here? like Recurrent meaning you have a block which is connected to one to another and the learning function is one after the other and the by-propagation happens and then it's more like a block of neural network.
00:59:40
Speaker
Okay. So ah and like, you know, language is a serious problem, right? ah For example, I am the, you know, a President of United States, for example, is related to um am, am is related to the, the related to president, and president related to United States.
00:59:58
Speaker
Everything is recurrent, right? So the idea of that is, if I remember this a sequence of data right, I then know which world is responsible for which kind of activity, right?
01:00:13
Speaker
ah We are now talking about a little bit high dimensional space. So yeah initially, i used to do this manually by saying AM is, let's say, this, and and United States is the noun, for example.
01:00:28
Speaker
I used to have a dictionary and pre-labeled data, and I used that to create a logic, which is algo or something like that. Instead of doing that, now I'm building an algorithm, right?
01:00:39
Speaker
oh So early and NLP work was this. So I'll train a very large neural network, right? And in a very high dimensional space. So I'll i'll put each text into a vector.
01:00:51
Speaker
Vector of, let's say, 100 dimensions, 1,000 dimensions, like that. Then I use that vector as my input into the neural network, which we call as an embeddings. oh you know I convert the text into embeddings and I feed the embeddings into the model, which is the neural network, and I'll train the model on those embeddings.
01:01:09
Speaker
when Then once the model training is done, done if I look into these embeddings, right something like, let's say, once model training is completed, ah then when I give a word, there is an embedding vector that is will be generated.
01:01:24
Speaker
ah When I plot that vector into this high dimensional vector, we started observing that the model is ah contextually was able to understand a lot of information automatically.
01:01:36
Speaker
So for example, um when people trained ah ah trained this RecurrentNet model on a very large data like Wikipedia data, for example, Wikipedia data itself is a very big corpus. right ah So after training and when they start plotting this graphs, you know so let's say all the all the nouns, for example, they observe that the embeddings vector of all the nouns are in the same location, which means the model was able to understand that this is a noun and you know certain other things also.
01:02:09
Speaker
ah So this is a very interesting idea right now, which is you know what happened with images where I don't need to define the borders, texture, color separately. I can do that by having a neural network that can learn it.
01:02:25
Speaker
Now I can apply the same thing to language as well. ah So this this was the beginning of a new kind of era of research around NLP now.
01:02:35
Speaker
So there is one stream of research, which is around CV, one stream of research around NLP. And then the other stream was, which was continuously around speech, translation, and all that stuff.
01:02:47
Speaker
In translation also, weve we have seen a ah ah very good amount of success by using ah things like RNNs, for example, things like a neural network-based systems, for example, again for speech. Okay.
01:03:00
Speaker
So this is the like 2010s is when people started, twenty twenty which is after Alex said 2012, 2015 is when many people started under understanding that there is a value of commercializing this. And we believe there is a lot of value in commercializing, which is where startup like ours, for example, we started and then there was people like Clarify, Metamind.
01:03:22
Speaker
ah There are bunch of like 50 odd startups who were early deep learning startups. ah you know ah very much like what you have seen in the recent times, right? Like a lot of foundation companies ah came up because there was one research output.
01:03:36
Speaker
Same thing. So there was a research output. There are a bunch of labs who were already doing ah similar kind of research. And they said, okay, now let's commercialize this because we have reasonably good amount of ah model in place.
01:03:48
Speaker
So initial approach was oh model as a mode, meaning I have a better model as compared to others. So you could use me to have a better performance in that output. ah So we were focusing on a different kind of problem.
01:04:02
Speaker
So we were looking at so building an AI assistant for STEM researchers. So we were coming from an academic background. We were doing research. what is your and What do you mean by we here? Who all are there? And like, when did you discover the world of AI?
01:04:21
Speaker
Oh, yeah. So this is way back in 2012, 2013. So I was then in IIT Bombay in third year and moving to ah fourth year. So my program is a dual degree, which is B-Tech plus M-Tech masters.
01:04:39
Speaker
So as part of the masters, you have to do a research, right? So which is um before your pre-final year, before your final year, you will do a thing called literature review.
01:04:50
Speaker
um So the typical process you'll go to a professor and tell them that, you know look, I'm interested in this topic. ah I would love to do research in this topic. So what should I do? So the professor will tell you to read a bunch of papers for the next six, nine months ah and ask you to figure out an area where you can work with him on that research.
01:05:11
Speaker
okay So this is ah a way that you would typically do a research in a PhD or a master's or or a research project, for example. I was stumbled upon Alex's paper when I was going through that literature review. so like So the idea was my specification has to be in computer science, meaning I have to pick up some topic which is related to computer science.
01:05:32
Speaker
um And I was fascinated with AI to some extent because it was very early idea. There's a lot of fascination. ah And I believe, I thought maybe this is something that I could think of to do some research around.
01:05:46
Speaker
And I stumbled upon Alex's paper at that time. um um But I have to apply that for my research problem, which is in mechanical engineering. So i I started using the CNN architecture for my research thesis, but that's what that that's when I was introduced to DeepLock.
01:06:05
Speaker
This is again, very small community, but what happened is there are back-to-back a bunch of increase in number of papers talking about this concept and this ah this idea, for example.
01:06:17
Speaker
ah So that was the same time when we were facing the problem. At least I was facing the problem around literature review, right? ah ah So I read bunch of papers around what ah you know around and NLP as well in terms of how we can learn the language and everything.

AI Startups and Research Challenges

01:06:34
Speaker
So the idea at that time was, if I want to do literature review and come up with a topic ah and go to a professor and spend my next one year, two years on that topic, i I should do that literature review as good as, as best as possible, which can which can stay true for the next one to two years. Right? Right.
01:06:54
Speaker
But the problem is, ah we have seen so much of papers being published. I think at that time, close to 30 papers were published on a peer review basis, ah which means 30 new papers were new ideas were published across topics.
01:07:08
Speaker
ah That itself is a so much amount of new knowledge right that is coming into the market. ah Forget out about what's happening current. So I'm talking about scenario 12 years back. Then the idea was, you know we were using tools like Google Scholar and Scopus, Google search engine.
01:07:27
Speaker
So search engines at that time are also simply keyword search engines. Like in 2012, when you type in things like capital of US, you were given ideas which were having the keywords like capital and US. s And the ranking is a page ranking algorithm, right? ah more recent More investment for Google was on type page ranking, not on the NLP part because NLP part was not evolved at that time.
01:07:53
Speaker
ah We said this is not ideal for a research area because let's say if I'm topping if i'm typing something around how to do speech recognition using yeahi I didn't mean any of it in research. right What I mean by that is I'm looking for an algorithm which can help me to do, let's say, a token-to-token conversion from audio to text.
01:08:16
Speaker
That's my intention. But I can't type all of that in search engine. I may not even know this information from an academic point of view. um So that was the idea. So meaning, ah one, ah you have so much of information that there is no experts anymore.
01:08:33
Speaker
and the search engines were very much a rudimentary kind of tools today. So why can't we use this technique, deep learning, and what is being used in NLP, and scale it up to things like search engine, and eventually to ah an assistant kind of source.
01:08:49
Speaker
So our journey was in 2013 is when we started the company, 2014 is when we raised the money, early 2014, and we were seven researchers at that point of time.
01:09:01
Speaker
ah but we were hiding yeah One question. How did you raise money? Because oh it sounds very academic to say that...
01:09:12
Speaker
ah researchers need a way to access research and be up to date, et cetera, et cetera. I mean, does it doesn't seem like commercial opportunity or the time which a VC uses, ah that doesn't seem to be there. So I'm just curious about that. Sure. So when we started, ah I mean, the idea was to start something which can solve the problem for researchers and build an AI assistant and a search engine, right? Yeah.
01:09:41
Speaker
We went to few researchers, sorry, few VC, a few angel investors at that point of time. So this is way back in 2013, right? So the VC community in India was pretty much non-existent. ah Angel community was pretty much very, very small. there is only There was only Indian angel network at that point of time, as a network.
01:09:58
Speaker
And there was Mumbai Angels who was doing a few investments. There was Thai global, the industry entrepreneurship kind of thing. um So this is pretty much purely accidental.
01:10:10
Speaker
So we were talking to a angels at that point of time. We were explaining that we want to build an AI search engine for researchers and eventually an assistant.
01:10:21
Speaker
um But our initial approach is to build a community around a platform, collect data and build a community around it. So that was the idea. And we at that time, we called it as InvenZone.com.
01:10:35
Speaker
ah It's called InvenZone. I think if we we still own that domain or not, I'm not sure. ah That was the idea. So when we went to them, we told them, here is a problem statement. a Problem statement is, there is so much of information that is there in the industry, ah in the market.
01:10:53
Speaker
The current tools are very, very redundant. If I use these tools, I may spend next six months to find an um answer or to find the right information, or I may not i may never able to find the information as well. ah this form This problem is only going to grow further and further, right? We want to solve this problem to start with these STEM researchers.
01:11:16
Speaker
ah They only liked that we are technical coming from a university with a background in research ah and from IDI Bombay. So that was only plus points. Otherwise, there is no there is no validation at all. right So there was no tech company at that point of time in India.
01:11:36
Speaker
There was no AI company at that point of time. So there was no funding around tech-driven companies. It was not there for even for the next following few years as well. They liked our honesty and the investment was also quite small. It's not that much. So we raised it from around 20-odd angel investors.
01:11:56
Speaker
oh And that was good enough for us to get started with. ah But that's when the actual problem is, right? So the problem was ah filed to some extent to raise the money, but the problem was to solve the problem, ah to build a company that what we want to do.
01:12:10
Speaker
ah Now, let's look at the scenario. right There is no university which is teaching about deep learning. There was no compute accessible in cloud. Today, you can get a GPU on cloud.
01:12:22
Speaker
Imagine two years back where you don't even have a... GPU was not available seven years back as well. So you know that only happened in the last seven years. ah So GPUs were not available anywhere.
01:12:35
Speaker
We have no clue in terms of how to start coding and how to start implementing those. ah We were simply hiring people based on largely math background and computer science background.
01:12:46
Speaker
ah ah We were always that odd one out kind of team. We were hiring people who may not who may not be very well in academics, but very good at coding, for example.
01:12:58
Speaker
who may have built very good algorithm, but their academic score is quite poor. ah We were the odd team, seven people. All of them were recently grads, pretty much.
01:13:09
Speaker
All of them are recently grads, in fact, correct. ah we were our Our initial way of approaching the problem is by reading papers, by going through open source that attach to those papers.
01:13:20
Speaker
ah by by playing around with a bunch of tools and implementing it. So there was no TensorFlow, there was no PyTorch, there was no ah scalable deep learning framework at that point of time. like what what do these tell What is TensorFlow?
01:13:36
Speaker
So TensorFlow, PyTorch are the libraries that you would use to build a deep learning model, ah like Spark or Scala, for example. These are code libraries. Yes, Python libraries. So when you want to write a neural network, for example, instead of writing in C++, plus plus you can write it in Python using these libraries.
01:13:55
Speaker
At that time, we were using C++ plus plus or there was an NVIDIA framework up called digits at that point of time. So we were using digits because we needed code programming.
01:14:06
Speaker
Okay, one more question. This deep learning here that as a term refers to the neural network ah frameworks, all of these like the RNN, CNN, all the innovations which happen collectively, you're talking about that as deep learning.
01:14:20
Speaker
Correct. So neural networks is, you know, you have bunch of perceptrons talking to one another or learning from one another. Deep learning as an academic concept consolidates everything which uses a neural network-based approach, like RNN, CNNs. All of them are under deep learning as a concept.
01:14:38
Speaker
Okay. ah Yeah, so our we started with, you know, so what we have done is we went into, know we kind of set up our own our own work machine at that point of time.
01:14:50
Speaker
We bought 1080 at that point of time using for deep learning. Probably we were the only guys at that time in India who's using deep learning on ah one zero eight zero It was being used in US for sure, some bit of areas in Europe. Research was happening there.
01:15:04
Speaker
ah We brought in 1080 GPUs in the market, and then we started using those in building these models. So our approach was we started to build the AI assistant for STEM researchers.
01:15:18
Speaker
So we scraped the entire internet and indexed close to 34, 35 million research papers. At that time, total number of research papers was around 45 to 50 million. and ah So there were 50 million research papers. We were able to scrape around 30, 35. It looks like a good enough.
01:15:33
Speaker
So this scraped information is sometimes, you know, ah with ah with without consent and all that stuff. This is like classic. You will go through a website and scrape whatever is available to scrape and then you use that information for indexing.
01:15:48
Speaker
So we picked up this 30 million research papers and we built an RNN model, which is probably one of the biggest at that point of time for researchers. um At that time, there is Wikipedia-based ah ah vectors.
01:16:01
Speaker
ah This is called Word2Vec model. People have used Word2Vec model to convert a bunch of datasets and build but ah the embedding space that I said, right? ah The weights, the pre-trained model that in the current scenario like an LLM.
01:16:14
Speaker
At that time, it's it's the Word2Vec kind of model. oh So we scrapped the internet, we indexed a bunch of research papers, and we built this model. Now, we started using that model as the base for our search engine. okay So ah like the RAG, which is there in the current scenario, we were using that in a similar kind of concept. So the idea is you use RAG.
01:16:38
Speaker
ah Retrieval Augmented Generation. but so but i guess um okay So in the current scheme of LLN, when you ask a question to an LLN,
01:16:50
Speaker
So you will ah you would have indexed a bunch of snippets from your data, based on the vectors, and you do a cosine similarity between your input and a bunch of indexed vectors, ah which is nothing but closest contextually similar information sets from your data.
01:17:11
Speaker
ah You will pick up top three of them and ask the LLM, give this information to the LLM as part of the context saying this is the question, this is the similar information that I retrieved from my data and LLM will use the question and the similar context to answer the question.
01:17:30
Speaker
This is the rank. So one one way to get an answer for an LLM is directly taking the answer from the model, from the pre-trained weights or fine-tuned weights.
01:17:41
Speaker
But the problem is it may not have all the knowledge required for your question. Let's say, for example, your domain specifications or your notebooks or anything like that.
01:17:52
Speaker
So instead of training the model on this data again or specifically, ah you would use a technique called RAG, where you shortlist information and you give the information to the machine to summarize and give you the answer.
01:18:07
Speaker
Okay, I remember early days of chat GPT, there was this thing that prompt engineering is the next big profession and yeah you will need to learn prompt engineering, which today nobody talks about is because of RAG.
01:18:20
Speaker
I guess but RAG is... No, because of reasoning, in fact. So I'll come to that as well. But RAG is something that people use even today. Like if you want to use any kind of LLM in an enterprise application, like a question and answer system, chatbot, or anything like that, you always use RAG. Or you do SFT, which is so supervised fine-tuning.
01:18:42
Speaker
ah Or an SLM, if you are if youre you have good enough enough data, for example. SLM is a small language model. But this is the current method. At that time, we were doing a similar kind of methodology. So we had a bunch of research paper abstracts.
01:18:56
Speaker
ah We are doing search engine, right? Which means as long as I can extract contextually similar data, I can answer your ah question, which is the search query. but We were probably one of the first search engines who was using the contextual-based similarity search engines.
01:19:12
Speaker
Specifically for raw STEM research, we were the only guys at that point of time. So we we started doing few things interestingly after that as well. So the idea was we combined classic NLP plus deep learning model to create something called knowledge graph itself, but in a different kind of manner.
01:19:31
Speaker
We said for a STEM researcher, when they ask a question, ah it's not the paper that has most amount of ah references is important, but the paper that could get most number of references in future is important, right?
01:19:47
Speaker
For example, if I'm typing something, if it is if it has if it has, let's say, 10,000 references, which means it would have been a wallpaper, that approach and theory would have already been used by many people. I would rather be interested in a paper which is recently published, but has something which is a base for me that I can build and and build something unique.
01:20:06
Speaker
Like novelty is everything in STEM research, right? You have to add the delta or else nobody care about your research or nobody care about your paper. ah So how do we build that kind of capability into the search engine, right?
01:20:19
Speaker
So ah by mapping the knowledge graph, we were able to map and predict what is the next topic of focus. For example, when a user types ANN, artificial neural network,
01:20:31
Speaker
ah We know that if i if I take the projection for next five years, next relevant topic could be deep learning because I see more papers being published in deep learning. Within deep learning, let's say RNN, for example, right?
01:20:46
Speaker
Which means if I see that you are looking for a deep learning, i deep learning I'll probably go to RNN and the domain focus that you are looking at. So we call this as a predictive search.
01:20:56
Speaker
So in the search engine, we gave you an option that predict the search queries for next five years, 10 years, like that. So your search ranking will change, um which was a very, very interesting idea. So saying that, okay, fine. So if I want to do a relevant search today, this is what your top paper would look like.
01:21:15
Speaker
Now, if I change my requirement to, let's say, next five years, where would I see more papers getting published and all? I would see new kind of topics emerging in my search criteria, right? ah so But there is always a question of relevance of the trend line and everything because in research, most of the times, the disruption is always a new trend line, right? A new trend line would always take over than the trend line that is following.
01:21:41
Speaker
ah But if you're in that trend line, it's better to look at what's happening in the next five years, any which ways. ah This is something that we created and then we want to take it to one and conversational kind of system.
01:21:51
Speaker
We said, you know the search engine is still a very rudimentary way of interacting with the data. Why can't we have a question and answer kind of system? right That's what we have done as part of the next step of iteration for the product. We build on STEM assistant where you ask the things like um which algorithm is more relevant for my research area like this. right In the backend, I was doing something like an expert system, right? ah No, I use deep learning to extract certain of things. I augmented a bunch of things ah using classic NLP and then I was giving them the answer.
01:22:26
Speaker
ah The problem is it's not dynamic. you know The NLG part, natural language generation is not existent at the point of time. I can't generate the text right.
01:22:38
Speaker
ah If I could do that, then I could combine this and then simply give the answer in the current state of scenarios. This was not there because of the limitation of what's there in the deep learning at the point of time. So this was the case until 2017, right? Until the attention is all unique paper got published.
01:22:55
Speaker
ah So ah during that period, we gave up on that idea. yeah what is that Why is that paper important? I'll come back to it. ah But from 2012, 2016, 2017 onwards, many people were in a similar stage, right? Like ourselves, for example.
01:23:12
Speaker
They had a very strong academic background, would have published certain an algorithms, certain ideas. Now they are trying to figure out a commercial applicability of those ideas in different, different departments.
01:23:23
Speaker
Some of them were solutions like Clarify came up with an image labeling solution for e-commerce. They have seen that as a good use case and they scale the use case. Even today, they are still existent and they were able to scale it up.
01:23:35
Speaker
ah And there are people like Metamind, for example, which is Richard Shoshar, who was doing a lot of work on and NLP at that point of time. He was building a deep learning platform. So that's called MetaMind.
01:23:47
Speaker
we were doing We were using that for and ah STEM research, like a search engine slash assistant as an end use case. So we you know even though there is a lot of ah revolutionary tech, our approach came out.
01:24:03
Speaker
The success of it was very limited to one or two use cases. Image leveling was one. And to some extent, Search Engine was next. For the next few years, Search Engine has transformed from keyword to contextual, all

AI Tooling and Platform Development

01:24:17
Speaker
of that stuff.
01:24:17
Speaker
So Search Engine got some idea. um And text classification has got better. So wherever people focused on these commercial problems, they were able to scale and build an enterprise product ah out of it and able to scale things out of it. right And then there was another class of products, which is the tool-based startups.
01:24:37
Speaker
ah Like at that time, for example, there was Histo.ai, there was Databricks, DataRobot, there was Dataiku, for example. They were looking at classic ML and big data as a product.
01:24:52
Speaker
So now we are looking at you know three layers, right? So you have a tooling, all go, and applications. So these are the three different things that started happening at that point of time. what What is the service provided by the tooling layer?
01:25:06
Speaker
These are like ontologies. no no, no. So this is like ah libraries, for example, or platforms where you can build models ah or or or even or even largely platforms primarily.
01:25:20
Speaker
ah Like, for example, you would use a data robot to connect your datasets and build models. They were using classic ML and slowly migrated to deep learning till late 17 and 80s.
01:25:33
Speaker
We pivoted in 2015-16 from the idea of building an assistant to tool because when we reached to this stage, building an assistant, building a search engine, we realized that This problem of building an assistant is not solved today yet because there is there are three things that we have we are primarily trying to solve but couldn't solve.
01:25:56
Speaker
One is the memory problem, right? Meaning when you're asking a conversation, your are follow-up question is always and follow-up to a previous question, which would have been a follow-up to previous one. So you are asking questions to sequence to sequence basis, right?
01:26:09
Speaker
ah um in conversation, this is important. You need to remember what is spoken in the past and continue the conversation. This was one. Second is the analogy part. We are not able to generate the text accurately because we are not able to understand.
01:26:24
Speaker
So far, whatever we have done is we and we understood the language, but we weren't able to replicate the language. We were able to create a new language. ah The same thing is also happening in images as well during the same time, which is we understood the image and classified it.
01:26:39
Speaker
But if you ask me to create an image at that time in 2014, 15, I couldn't do it. We couldn't able to. So the generation part, NLG or image generation is not existent at that point of time.
01:26:52
Speaker
So things that became successful was, you know, which were doing classification, which were doing ah contextual search, contextual search, for example, they were they were successful, both in images and tabular data sets.
01:27:03
Speaker
And then there there is a tool part, like data robot, for example. ah We became one of those tools very early on because we know the challenges of building a deep learning model. like not having a TensorFlow or a Torch being there.
01:27:17
Speaker
So we build a framework called Braid. It's very much comparable to Keras. So Keras is another meta framework. um It sits on digits and and simplifies the way of building a deep learning model.
01:27:31
Speaker
We said, fine, so we may not have to solve the problem. Let's, you know, wherever the problem is solved using deep learning, let's build a platform so that people, we can enable others to solve those problems.
01:27:42
Speaker
okay So we built a platform with all these tools. Like we launched an open source library. We launched an enterprise platform where an enterprise can connect the data and build deep learning models on our platform.
01:27:56
Speaker
We can take care of things like compute optimization, inferencing, for example, model scaling. All those things is offered as a machine learning as a service kind of platform. MLOps, classic MLOps platform, right?
01:28:08
Speaker
We were one of the first around deep learning, but otherwise there were products like DataRobot, History.ai, which is already doing around ah classic machine learning kind of approaches. So this is the early commercial success ah in deep learning, right? 2012 to 2015. How do you monetize MLOps? Is it like per API call or what?
01:28:29
Speaker
No, at that time it was purely around the subscription fee and there is compute cost. So subscription fee is you pay to access the platform and compute cost is whatever you consumed, you pay for it.
01:28:40
Speaker
ah I don't control the models, thereby i don't control the API quantum. ah You would build an image classification model, you would build a text classification model, it could be anything, right? You can you can play with any of those.
01:28:51
Speaker
ah so um ah So this was the early commercialization era for deep learning, right? So we were doing tooling, people were doing solutions, ah people were migrating from old, sorry, classic ML2 deep learning kind of part.
01:29:05
Speaker
So 2015 is when there is a lot of like here and there strategies. There is enormous amount of new strategies came in. Because of the early success of NLP, there was AI bots demand ah rose up from 2015 onwards.
01:29:23
Speaker
like if and then kind of stuff, right? So what we have done during our ah STEM research ah point of view, people thought that is a scalable, which is like exactly the same thing that happened in nineteen eighty s right? The expert-based systems.
01:29:37
Speaker
They said, I can scale the NLP problem by building an if and then kind of rule. ah there is This is where things like dialogue flow from Azure came into play. Meta, which is previously Facebook, has launched a bots into Messenger.
01:29:52
Speaker
ah Alexa got launched at that point of time from Amazon. right So this is an early stage of a conversational system as a service.
01:30:03
Speaker
People thought these ah rule-based NLP systems can scale, and hence people started investing around bunch of bots. 2015, 16, 17 is when people invested a lot on the bots.
01:30:14
Speaker
A lot of research is now focused around creating a conversational capable AI systems. ah Whatever is commercialized became fall down. They were never able to scale up.
01:30:26
Speaker
A few people have scaled up in very conditional problems. ah In India, there was a haptic. Earlier, they were called as HelloChat. It was a customer care service service.
01:30:41
Speaker
They migrated to a bot-driven platform. There was Nikkei.ai in India that that was solving that problem. So Nikkei now shut ah shut it off. Elo.ai, for example, helloo Networks, they were previously...
01:30:56
Speaker
ah these, ah you know, repositories kind of stuff, ah like, hello pages kind of stuff. They migrated to now chatbots. So there were a few products that got migrated into this enterprise, if then kind of dialogue systems.
01:31:11
Speaker
They were successful for a specific problem statement, like, ah Like IVR in call, ah people were okay to do IVR on chatbots, ah bus ah probably for customer support, right? Like if you have this problem, they were able to you know localize and then do the next set of conversation. But the ah the importance or the attention for conversation systems has gone up at that point of time.
01:31:36
Speaker
NLP has gone up ah really massively. ah But you may ask what happened with CV, right? Computer vision. Like we had the early success of CV, but it was not scaled. ah The actual commercial value of CVs and things like medicine, for example, like radiologists or in video or video surveillance, or there is a video heavy problem, for example.
01:31:59
Speaker
ah So people were trying to figure out how to use CV in all those commercial use cases. They have applied CV for e-commerce based use cases where a bunch of product SKU mapping and all that stuff were scaled. There are a bunch of startup who came in for that use case.
01:32:13
Speaker
They have done solutioning as well. They they but they did very well. And for NLP, whoever have done this point-focused specific enterprise problem, they were able to scale and sustain in the past few years.
01:32:25
Speaker
Tools were able to successful who were doing, you know, classic above. Tools we were doing is deep learning, which is very niche at that point of time. And the success is only largely B2C areas, right? B2B is not that successful.
01:32:39
Speaker
ah Deep learning is not that successful because of data sets and all that stuff. ah Of course, if you have solution, would have been successful there. But otherwise, this what the case was. So we have to pivot because we were very early in the market in terms of deep learning tool set.
01:32:54
Speaker
The community is very small. So we started saying that, you know, why don't we build the end-to-end platform for ah one industry? Let's create a more basis on that. Because you have tools, you have models, you have solutions, you have guardrails, risk rails on top of it. You have a lot of things when you start verticalizing.
01:33:11
Speaker
ah The verticalization is quite, you know, quite deep. um If we build it, then that's the mode, right? Because nobody else can build it. It's quite hard. If you apply for multiple use cases, it's quite hard.
01:33:23
Speaker
So we have taken the verticalization approach, approach saying that let's verticalize the platform instead building horizontal platform. There were solutions and da-da-da-da. So 2017 is when we started seeing a new, ah slow transformation around the language.
01:33:38
Speaker
Because of the paper, attention is all you need paper. and BERT models, right? ah So i again, 2015 to almost 2018, 19 as well, even 2020 also, the research was very much incremental.
01:33:54
Speaker
So it was alpha go on the reinforcement learning side, um And then you had a little bit of successes, ah but energy was a problem. Image generation was a problem.
01:34:06
Speaker
So that became main focus of point to solve from 2017, 18 onwards, ah both for vision and both for language as well. ah In the vision, a good the thing that happened is the creation of GANs. ah so ah ah so ah GANs were introduced in 2014 ORD by Eon Gottfellow.
01:34:33
Speaker
so GANs is General Adversial Neural Networks. So the idea is how I'm able to understand an image. Can I generate an image as well?
01:34:43
Speaker
right So for example, today you're seeing all this image generation diffusion based models. The early version was GANs. ah Images has had a little bit of success because of GANs very early on.
01:34:57
Speaker
Whereas for text, it is still kind of building up. right ah So when 2017 paper came in, the attention is all you need paper, that scaled the idea or that you know when you ah when you actually store or make the model learn, not only the sequence, but also the position of the sequence,
01:35:17
Speaker
Meaning, let's say I am the, you know, who is the president of ah United States, for example. So ah ah you have to look at all these tokens and how these tokens are positioned in a sentence.
01:35:28
Speaker
ah If you remember the sentence location as well, the location of these tokens, ah then this can and ah and increase the scale of the model and increase the scale of the training data and increase the scale of the compute.
01:35:41
Speaker
ah You can build a large enough model which can understand the language very well and also start ah generating languages in a very rudimentary terms variable. right ah So this was done by Google at that point of time. Attention is all you need paper. And they launched a bunch of large language model classes called BERT.
01:36:00
Speaker
So BERT is like you know the OG of LLMs since the beginning. ah Whoever is doing an NLP or old age AI guy, they would look back and say that BERT is the main thing that they would have always been used.
01:36:12
Speaker
ah Even today, many people use BERT because it's cost efficient, resource efficient, and reasonably successful when the problem is very much you know limited, for example. um So this was happened from 2017 onwards, where energy and image generation started flowing in, but it was never scaled to saying that you know there's going to be a disruption.
01:36:36
Speaker
It was until 2020s also, we were thought we were thinking that you know deep learning is gone because you know so that use case is is very limited only on classification.
01:36:47
Speaker
Generation is not picking up. Reinforcement learning is not picking up. Compatibility is still in a very rudimentary stages from a generation standpoint. The commercial success of deep learning was largely on the B2C case.
01:37:01
Speaker
B2B, it's not that much. ah And even on B2C, it's largely with the guys who had a lot of data, they had and advantage of cost, an advantage of this thing.
01:37:12
Speaker
ah um But people will continue to invest saying that you know there is some light at the end of the road where they can they would have some success. success right That's what everybody is saying. oh yeah I want to ask you a couple of questions here.
01:37:28
Speaker
How is reinforcement learning different from deep learning? Reinforcement learning is sometimes class, sometimes not a class. So reinforcement learning has a different kind of learning architecture where you could use deep learning architecture as well, deep learning model as well.
01:37:43
Speaker
So in reinforcement learning, you have a state space and action space and a reward. So statebased state space is where you would probably think of what to do next. And action space is where you would do the action.
01:37:55
Speaker
ah For example, let's say driving a car, right? When I'm driving a car, I should turn, right? That input should come in the state space. And the reaction or the reward is whether I have done the right step or not is the reward.
01:38:09
Speaker
Which means, let's say I'm playing a chess game. It's quite easy to do on a game ah problem. So at each step, I know whether I'm winning or I'm losing, right? Or ah the probability of winning the probability of winning, I can calculate ah as long as I take a step where my probability of winning is growing, um that's my reward. That's the Q learning part in ah in reinforcement learning.
01:38:32
Speaker
I can use this using traditional Q learning or use a deep learning model to do Q learning. ah But otherwise, reinforcement learning is also kind of part of deep learning as well. Or can be looked at separate tasks also.
01:38:43
Speaker
Which is why it worked for AlphaGo because there is clear... Gaming, correct. There is a clear cue that whether the move was right or wrong.

Modern AI Models and Transformative Papers

01:38:52
Speaker
good yes Okay. Yes. And the protein folding problem as well. ah You know, that's the major invention happened in ah reinforcement learning, right?
01:39:00
Speaker
ah So there are so many protein combinations that there are infinite protein combinations. So how do you create a right protein combination for a right problem statement? So that's also a reinforcement learning kind of problem.
01:39:14
Speaker
but like What does protein folding give you? and mean Why do you want to fold the protein? ah Yeah. So ah when when when you design a medicine or when you want to test a medicine, for example, so you will you will play with proteins, right? So which protein is reacting to which kind of medicine, for example.
01:39:33
Speaker
ah This, ah again, I may not know exact specific of what the uses of protein, but finding the right protein for medicine is important. Okay. One for creating the medicine, next to test the medicine as well.
01:39:45
Speaker
ah So the more number of good proteins that I can create for a specific use case, the more I can test and the more I can innovate on curing. That's why i as that problem is now solved to a very large extent.
01:40:01
Speaker
ah People are predicting that in the next five years we can cure cancer because now I don't have challenge of protein creation, which was a huge compute problem and a huge creation problem. I can have AI create proteins for these specific problems.
01:40:15
Speaker
And I simply start testing and doing clinical trails accordingly. So I can scale the curing problem quite exhaustively after that. Okay. And you spoke of diffusion-based models. What does that word diffusion mean?
01:40:29
Speaker
yeah So GANs is one approach, general adversarial neural networks. So GANs, as you have network input, ah you have an image going in. ah That image, you know you have number of layers reducing. And then and in the first set, in the first component of neural network, you will start decomposing the ne image.
01:40:48
Speaker
In the second set of neural network, you will start composing the image. And if you see whether your input image and output image is almost the same or not, this is the GAN model. In a diffusion model, you have a diffusion kind of neural networks where you add little bit of noise ah into the into the neural network and you try to rectify that noise.
01:41:08
Speaker
So as long as you're able to create an image by able to addressing this ah noise as well, you can create an image properly. So diffusion models are scaled very well in image generation today. You take any example of text-to-video or text-to-text or text generation itself or text correction, for example, all of them are based on diffusion-based models.
01:41:30
Speaker
Okay. ah Yeah, so nextni what happened is 2017 is when we have seen a bunch of new ideas coming in, right? But as I said, I think until 2021, 2022 also,
01:41:43
Speaker
ah that's we we never thought that what the scale was or what the next inflection was. Because always year on year, the the increment was very much organic.
01:41:55
Speaker
Like we have seen next iteration of an algorithm, next optimization, next better compute optimization, for example, like that. ah But what's happening in the background was there were very few companies who are investing on the idea that, oh you know, the larger network, larger data, larger compute is the answer to the problem. um You know, think back 1960s, right? So 1960s, the network size is quite small.
01:42:20
Speaker
Then we figured out that in 1990, 1980s, that if the network size is a little bigger, it can learn properly, but we don't have the compute. In 2000s, we were able to scale the network, but we didn't have the data and the compute.
01:42:37
Speaker
ah But now in 2020s, we had good enough compute. We had good enough data. We had the algorithm and the learning logic. right So it's an evolution of almost 70 years before we got to a stage of 2020.
01:42:54
Speaker
and this has not happened with large organizations. That's a surprising thing, right? So ah because nobody believed in this idea, so nobody thought this is an idea, for example. But opening, I thought this this is a good idea.
01:43:08
Speaker
open AI or stable diffusion, for example, ah like open AI, when they started in 2016, 17, 16, in fact, their idea was to invent or invest around disruptive ideas.
01:43:21
Speaker
So they went in a similar route of deep mind in the beginning. They were building robots, state space models, ah you action-based models, for example. ah But, uh,
01:43:34
Speaker
But when the attention is all you unique paper was there, BERT was there, they started investing around building larger set of models. ah There was ah Microsoft who was also building large models.
01:43:45
Speaker
There was Google building large models and there was OpenAI building large models. So their GPT-3 model in 2020-21 was one of that early model.
01:43:57
Speaker
ah But the success was still very rudimentary. right That was the same year where AlphaFold 2, the putting structure prediction model was ah was successful.
01:44:10
Speaker
GPT-3 got launched in the same year. ah But um if you ask any NLP researcher or any AI researcher at that time looking at GPT-3 output, they would have said, you know, this is just gibberish because, you know, we were doing that even three, four years back as well. So what's what's different?
01:44:28
Speaker
There is not much of difference in what GPT-3 is doing versus with other players. But OpenIA was reluctantly investing in larger model, right? And then there is RLHF as a technique that they introduced around the model to fine tune and make the model specific to a certain task.
01:44:46
Speaker
ah This is the, ah when OpenAI launched GPT 3.5, it was extremely, you could call it as a simple update in their research, right? So they had to show some some update to investors because they were non-profit and they were raised a bunch of monies, close to a billion dollars. They were investing serious billion dollars on computers.
01:45:09
Speaker
ah then they launched 3.5, people suddenly started realizing that it can be able to generate text really well, ah really, really, really well, right? As good as a human and sometimes better than humans, at least in certain kind of problem statements.
01:45:24
Speaker
So this is the inflection of new AI era, right? ah But the conclusion was you need large data, large model, and large computer. As long as you can scale all these three, you can solve a problem.
01:45:37
Speaker
This is demonstrated in text. This is demand demonstrated in protein structure. This is demonstrated in images. ah That's when you know it became like ah and and know whoever wins the jackpot will will become the winner of it, for example.
01:45:53
Speaker
ah So then people started investing heavily on building such large models, large data sets. Compute was invested like anything else. This is the search for NVIDIA demand going up, skyrocketing heavily in terms of building these kind of models, for example.
01:46:10
Speaker
So this is a new era for, this is now the new era of large models. ah You can call it last decade was primarily perfecting the deep learning scale.
01:46:22
Speaker
ah Now we are applying that scale. So we are primarily applying that scale for from this year onwards, right? ah So there is a certain guarantee of of you know predictions like an AGI, all of this, because now we know that this is a function of scale.
01:46:39
Speaker
um But it's not without criticism for sure. right ah Like Yan Nookun, for example, he's a very, very strict criticism around LLMs because these are next token prediction, right? Autoregressive models, which are predicting the next token.
01:46:55
Speaker
As I said, learning the token and predicting the next token, for example. ah But in reality, ah there is a limitation of how far you can you can achieve from these kind of models. right ah Let's start with ah reasoning. no You mentioned about prompting, for example.
01:47:10
Speaker
ah So typically what happens is you give an initiation to the LLN to generate the text. This is what your prompt, prompt or question, whatever it is.
01:47:22
Speaker
You are creating the first initial set of sequence and the model tries to complete the sequence. right ah Now ah there is a bunch of questions right now. right Is there a reasoning while generating the sequence?
01:47:36
Speaker
or is there a conscious while generating the sequence? These are the two big questions right now. ah And then the third question is around how is the model functioning to generate the sequence, the explainability part.
01:47:50
Speaker
ah Now, ah the way to generate the sequence properly is giving the prompt properly. ah One is, of course, I can increase the size of the model, but if my prompting is not good, then the model can always fail in giving that output.
01:48:05
Speaker
ah Now I have ah a trillion tokens kind of model, billions and billions of tokens the model is trained on. It becomes quite hard to be able to create the right problem for your problem, right?
01:48:20
Speaker
and This is where the reasoning comes came into play. Reasoning is, oh, well, I can draft the chain of thought, chain of thought as in the sequence of prompts in a manner that I can make the model give the best in a very less resource-intensive manner.
01:48:36
Speaker
ah This is one part of school. Second part of school is I will make the model think more and more and more before giving me the answer ah such that I can utilize the humongous amount of learning that it has.
01:48:51
Speaker
ah But instead of restricting the output ah time to, let's say, one minute, I'll increase the output time. So this is the test train compute. right ah Test train computers, instead of taking the inferencing ah instance, I'll increase the output time.
01:49:09
Speaker
I'll make the model think more and more before giving me the answer, such that it can be able to or it should be able to solve more complex answers. like So you have seen early prompt era in 2012, sorry, 2021, 2023. And then you have seen chain thoughts in 2023. You are seeing reasoning in 2024, late 2024. Now you are seeing test and compute being scaled.
01:49:26
Speaker
haveve seen you're you're seeing reasoning in twenty twenty ah for late twenty twenty four ah and now you're seeing testing compute being scaled ah Like if I used ah the capabilities like deep research, which now every LLM tool has, deep research is nothing but making a model think more and more and more during the inferencing time ah such that I can be able solve that problem. right ah So people have done experiments to prove that if I give, let's say, one full day ah you know to think,
01:50:00
Speaker
what would model give me as an answer? Versus if I give you one minute, what would model give me as an answer? So for problem statements, something like, let's say, theorem proving, right?
01:50:13
Speaker
I want to prove a theorem, for example. but ah If I can increase the compute, I can make the model prove the theorem. ah This is why reasoning models are getting increasingly better in solving the Olympia-level mathematical questions because now it has more ability to ah reason, not think, reason and also time to think from that point of view ah to give you that answer.
01:50:39
Speaker
So there are two things happening. One is I'm increasing the scale of the model and I'm increasing the techniques around reasoning such that I can get the answer. ah But the question always comes key comes is, is this the path to AGI, for example? So that's always a question that comes back to it.
01:50:55
Speaker
ah But ah yeah, so this is the current era of AI where we are using such large LLMs to do multiple things. ah But ah you you will go back to the same cycle of AI in every generation, right? So you have a big innovation.
01:51:11
Speaker
Now it needs enterprise acceptance and actual commercial scale ah to come out of it. Then only we can see next immediate cycle.
01:51:22
Speaker
Yeah. In the last whatever years that we have seen so far, there was no continuous ah validation of experiments because ah that's the mistake that we continuously made, which is our expectations, right? So we always hype the AI cycle in each every point of view.
01:51:42
Speaker
ah Is this hype cycle also the same or not is something that we have to look at in the next two years in terms of what we can deliver from a kind commercial capability of of of the of the AI and then validate it.
01:51:58
Speaker
Of course, ah the hype is beyond the reasonable value that we can generate today. oh ah but the hype will get corrected or it's already getting corrected. So we will probably buy match the hype in the next maybe three, five years.
01:52:13
Speaker
ah and That's the other beauty of AI as well, right? So the ah that the test and optimization cycles are reducing further and further. um Like in 1930s, it's almost 20, 30 years for testing and another 10 years for correction.
01:52:29
Speaker
nineteen eighty s it's 10, 15 years for testing, another five years for correction. And 2010s, we have seen you know testing being four five years, correction another four five years.
01:52:41
Speaker
Now we have we we are testing we are testing like hell, right? ah Like everybody's an AI company now. So everybody transformed into a bunch of experimentation around ai ah We are testing multiple things, but we already have very great success areas.
01:52:56
Speaker
Success areas like a code interpreter, for example. It's massively success area from a productivity standpoint. You have the a content generation is massive success area. ah Customer care, customer chatbots is a massive success area.
01:53:10
Speaker
So these are the areas which were not non-existent two years back or very mildly existent. Now the value out of them is massive. But there are multiple things which is which it is not good at. Like, for example, fully autonomous decisioning, for example. right ah Even though it can make a decision, the next complexity is are around explainability.
01:53:31
Speaker
If the model is so complex, how do we understand it so that we can make the model, we can get the satisfaction of the explainability to all the stakeholders, for example. So this is where we are currently standing.
01:53:45
Speaker
um We have large sort of models where we have certain successes. And then we have obvious challenges, which is around explainability, auditability, alignment, being one of the most complex problems to solve for us to get to that next stage of commercial scale for AI.
01:54:00
Speaker
Okay, fascinating. um What is arl ef reinforcement learning human feedback. So when a machine is generating an answer, right? um So you will give whether it's good, bad.
01:54:15
Speaker
ah Like, as I said, the RL environment is you will take their feedback. ah Feedback in this case is good, bad, for example. Or you will give an example of a good answer. In both cases, the model optimizes how it's generating the text.
01:54:29
Speaker
So there is there is something called pre-training. Pre-training is you run the model across this massive corpus of data, and then you end up with a model, which is what we call as a pre-trained model.
01:54:41
Speaker
Next is fine-tuned model. Fine-tuned model is you apply RLHF on the pre-trained model. such that you are fine tuning it for specific tasks. Like for example, conversations, coding is a specific task.
01:54:55
Speaker
So if you see an LLM version, like LAMA version, LAMA inspect model, LAMA chart HF model, for example. ah So these are fine tuned on those pre-trained models.
01:55:08
Speaker
Okay, okay, understood. ah What is the G and P and T in GPT? T is transformer. I forgot the full form as well, but T is transformer. So it's GPT is a transformer.
01:55:22
Speaker
What is a transformer? I think generative pre-trained transformer, right? like that So what is transformer here? like Pre-trained, I understood, like it is fed like a large set of data. that Feeding data is called pre-trained from what i understand as a layman.
01:55:39
Speaker
Generative means it can generate that energy that you were talking about, the problem of yeah not being able to generate. What is transformer? Yeah, so our Transform is the Attention All You Need paper that I mentioned in 2017, which got published.
01:55:51
Speaker
ah So ah you have CNN kind of architecture, right? You have Conversion Neural Network, you Recurrent Neural Network, and you have Transformer is an attention-based neural network. What it does is whenever a few autoregressive tokens passes through, ah it it it remembers the attention and corrects the attention from time to time.
01:56:10
Speaker
Okay. ah So it's an attention-based model. ah So that's a transformable model. That's a transformable architecture. Okay. Okay. Okay. Got it. Got So recently, i read a lot about distillation learning yeah because of DeepSeek, which is the Chinese AI LLM.
01:56:29
Speaker
ah What is that? What is distillation learning? Yeah. So distillation is you take a very small model, but you fine-tune it on highly distilled data created by a larger model.
01:56:40
Speaker
So now you have two models, right? You have a teacher model and you have a student model. So a student model is a very small 7 billion, 3 billion kind of LAMA model, for example. Your teacher model is a very big model, like something like OpenAI, for example, right? Right.
01:56:53
Speaker
ah Your teacher model will give examples, the RLHF, as I said, to convert that into RLAIF, AI feedback, instead of a human feedback. Here, a large model is teaching a small model on a specific examples in a very, very constructive manner that the small model can learn very quickly.
01:57:12
Speaker
ah So I'll give you a few examples, right? In any generative task, be it image, voice, text, the quality of ah data, fine-tuning data is quite important.
01:57:25
Speaker
It's not just the quantum of the data. For example, let's say i want to replicate voice, right? I need a good 30-second or a one-minute voice.
01:57:37
Speaker
Now we are able to replicate even on that one-minute, 30-second because of this ability. ah Next, on images as well, so you have things like Flux, for example.
01:57:48
Speaker
ah Flux is a diffusion model where I can fine-tune image generation even on 10 images. All I need is 10 good images. right ah Similarly for text as well. So distillation is, I'll create those examples quite good, quite well, such that the smaller model can learn very quickly.
01:58:08
Speaker
Instead of you know having to scale my training data so much, I can have so much of so little of fine-tuned data so that this model can be can go can become as good as my teacher model, at least in a specific function.
01:58:22
Speaker
So this is a distillation process. Okay, okay, okay. In a way, instead of feeding it with one trillion tokens of data, you have it mimic a model which has been fed one trillion tokens of data. Yes.
01:58:36
Speaker
Like fake it till you make it. like that's like a No, you are making it, but very small example for a very small example. That's a distillation kind of model. Okay. ah Now, there are... ah Yeah, so there this is an opening of a humongous amount of fascination, right?
01:58:52
Speaker
ah So i know ah ah let's let's go deep on the rabbit hole for um for a moment. So one is on the explainability side, for example, as I said. So earlier, if you remember our AlexNet, for example, so we were mapping the activation maps to understand which neuron is important.
01:59:09
Speaker
There, I can do it because it's, I think, Alex having about 60 million neurons. ah Here i have 1 trillion token, 1 trillion neurons, 1 trillion plus, half a billion plus, 100 billion plus, billion, B for billion, right?
01:59:23
Speaker
ah Now, how do I understand what the model is, how the model is working, right? ah Why this is important? So alignment is quite important. Alignment is nothing but I want the model to learn or act in a certain manner.
01:59:37
Speaker
ah It could be in terms of as simple as a risk. ah Like I don't want the model to be exploited for certain cases, which is where you do things like guardrails, some kind of you know taping, for example.
01:59:50
Speaker
Or I want the model to be non-biased, such that it is in my data that could be a bunch of bias. oh Or it could be on the risk part, risk part as in how well the model is giving the answer, the hallucination part. right Fourth is exploitation. So how well the model can stay true to the model, not being exploited by someone who is attacking the model, for example.
02:00:16
Speaker
So alignment it could be in all these things. And then you also have regulations and societal factors as well. So if you have such a big model in place, how do we align the model, right?
02:00:28
Speaker
Or how do we even understand the model? So this is this is the next biggest problem because if I can't do that, then all I'm doing is I'm hoping that by building a larger model and by correcting few things in examples that I've tested, the model is perfect, which is not the case. We have time and time and time.
02:00:49
Speaker
In all the previous versions of LLMs, every version of LLM was hacked, right? ah Was jailbreak, was made to act in a very different kind of model.
02:00:59
Speaker
ah This is where it gets really interesting in terms of understanding how the model work. ah and This will be our next step to say design an AGI, for example, a good AGI.
02:01:11
Speaker
Or else we will probably create an AGI which we will not understand and probably could be exploiting a lot of things. ah For example, let's take the reasoning part itself, right? um There was an experiment done with with an LLN with a reasoning model, if I'm not, a reasoning model, correct, to play a game with a user.
02:01:32
Speaker
so what the more and and the And the success is winning a game, right? Instead of ah playing strategies like you know what we used to in AlphaGo or chess game ah in the past, this model figured out that there is a back route for the gaming algorithm.
02:01:51
Speaker
ah It exploited it and won the game, meaning it didn't play the game. It exploited the algorithm and tuned it in a manner that it could win the game. like If I want, this is what could happen. For example, let's say I want a best formula where my country is growing at a 10x, for example, it could always come up that it may tell you to, you know, to kill somebody or to attack somebody to do some things or do that in the back end as well.
02:02:19
Speaker
This is the biggest problem with the reasoning and the agents that we are talking about. ah yeah how do we control the planning and how do we ensure this planning is happening within the within the specifications of alignment requirements that we have.
02:02:36
Speaker
And the third part is how do how to do planning efficiently. If we solve these three problems, that's our route for agi But to some extent, but of course, this current architectures of transformers may not scale.
02:02:51
Speaker
This is where Jan Lakun's criticism is, um meaning ah you know in a realistic environment, if you take a dog or a cat, it does thousand things well in a given second by doing a lot of things.
02:03:07
Speaker
it's not It's not simply looking at language. um Currently, all LLMs or LRMs, large reasoning models, are language-heavy.
02:03:18
Speaker
ah You have to migrate from language ah to images, videos, multi multi-model, and the entire state space, action space. Then you could know that, you know, it if a given environment, how to act, how to behave, or how to work, for example.
02:03:36
Speaker
So this is why Yandokun is working on a new architecture called ZEPA. ah Jepa. I think I'm umm um i'm ah saying it right, Jepa.
02:03:47
Speaker
ah So Jepa is something that he has been front-ending for quite some time, almost two years now, but ah he believes that's the architecture where you could build the systems like this. But necessarily the belief is we will probably stagnate on the current approach sometime in the next one or two years.
02:04:05
Speaker
ah We will see new areas of next set of commercialization, which is where agents is next set of commercialization. people um People are doing less on the LLM part. People are now doing more on next LLM.
02:04:17
Speaker
What next can I do after having an LLM, for example? ah The agents planning and all that stuff. That will probably gain a lot of attention in the next two, three years. But from a fundamental LLM point of view, we are largely getting stagnated.
02:04:29
Speaker
um We are seeing one architecture migrating to another architecture, trying to match a similar kind of accuracy or a delta kind of accuracy. This is very evident in GPT 4.5 release as well, right?
02:04:41
Speaker
It's not the best model, but it's just good model from all these perspectives because we are largely now stagnated on this. um So we will see a lot of innovation after this on the agent space, reasoning space, planning space, on explainability and guardrail space.
02:04:57
Speaker
But whereas on the fundamental ah new architectures to create AGI, probably we will, ah ah there could be some architecture which like, you know, attention is all you need paper kind of stuff.
02:05:08
Speaker
ah Somewhere, the idea would have been already published. That's for sure, because all ideas will largely you know recreate and regroup. ah Or there could be something else which we have not thought through in the and in the last 70

AI in Financial Services and Regulation

02:05:23
Speaker
years, which will suddenly come and become the the soup or the recipe for agi ah Fascinating.
02:05:31
Speaker
ah Let's end with a little bit update on ARIA. So what have you so you said ARIA was at the tooling layer ah in 2017. What's yeah been the journey since then?
02:05:43
Speaker
Oh, yeah. So we started ah we pivoted to being a tool and platform and and started verticalizing for financial services, right? So that ah that's the biggest mode. In fact, that's what we keep on telling to people.
02:05:55
Speaker
ah Your mode could always be on verticalization, irrespective of the algorithm, right? That always stays true. ah Your mode can be on the algorithm, but you have to be very quick, which is what OpenAI is.
02:06:07
Speaker
ah Your mod can be on the tools, ah but you have to invest in a very long term, right? ah Like DataRobot, for example, it's in the journey for the last 20 years. To build another new platform, it's going to take time. So which means you have to have a lot of patience to be becoming a platform player, which is one of the toughest plays, but very sticky game if you become successful.
02:06:28
Speaker
ah We are into that verticalization. So we said the you know these layers ah largely remain, keep on changing, but the end use case will always get improved over time to time whenever there is a new addition, right?
02:06:41
Speaker
This we created as a PaaS, platform as a service for financial services, which is banks, FS, and insurers. ah We created from ground up because we started off as a platform and then we we created a bunch of pre-trained models and fine tunable solutions.
02:06:58
Speaker
And we have been deploying this as ah as a service ah for the past four, or five years is when we started scaling very well. ah This is going very, very good. But now the important problem that we definitely need to solve for the sake of ourselves, for the sake of the industry is the explainability and the alignment problem.
02:07:16
Speaker
In fact, our experience was actually pretty much firsthand. So we built a deep learning model and tried to deploy it in a financial institution. We got rejected by a regulator and the risk manager because it's not explainable.
02:07:31
Speaker
ah No matter how how accurate the model is, but the model is not exp explainable because you couldn't understand what's happening inside the model. so There are techniques. Just to give an example, like say, if I apply for a credit card,
02:07:45
Speaker
with ICICI and my application is rejected. i ICICI should be able to say why it was rejected. Now, if ICICI is using a machine learning model, it will not be able to say why it was rejected because it can't peek inside how that decision was made.
02:08:02
Speaker
The decision is probably a good decision. It was the right thing to reject, but why is something which the model cannot answer. And that is where explainability and reasoning, the importance of that comes in for financial services specifically.
02:08:16
Speaker
Absolutely. So explainability is the most important thing in financial, highly regulated industries, right? Not just financial services, but highly regulated industries, high sensitive industries. So for example, if I'm a doctor, if I tell you, you should do cardiac surgery, but I can't tell you why, then it's a problem. yeah And the i'm i'm let's say I'm running a fully autonomous spacecraft and it got failed.
02:08:40
Speaker
I can't tell why the so space gas spacecraft has failed. like Again, a problem. I'm running a full run oil rig, fully predictive maintenance, all of it. And suddenly there is a bunch of error.
02:08:52
Speaker
So net-net, wherever there's regulations, there is ah but ah there is risk of value, value risk of life, ah risk of society, ah like governance, for example.
02:09:03
Speaker
If I'm only giving credit to certain class of users, not to certain class of users, and I can't tell you why, then again, a problem, right? So expandability is fundamental requirement in all of this. Second requirement is a risk, right? Meaning ah in all these use cases, it's not just that how accurate you are in those, let's say, 80 cases, ah but it also how inaccurate you are on the remaining 20 cases.
02:09:26
Speaker
The false positive value is quite high. You can't be wrong um inconsistently. You can be wrong inconsistently. Then that's where the w risk management comes into play, right?
02:09:37
Speaker
If you say that I can perform 80% all the time for this class of data, then the next job is for risk managers to figure out how to bridge that 20% gap where model fails.
02:09:49
Speaker
But the problem with models is they act very erratic, right? Because sometimes they may be doing very good jobs. Sometimes they may be doing very bad job. So how do you deliver that consistency?
02:10:00
Speaker
This is where the alignment also comes into play, Right. And we are only talking about the beginning of the problem because we are talking about a very small size models. If the model size is going to grow as much as an LLM, for example, there is no way you can use them in scenarios like this without solving these problems.
02:10:18
Speaker
So this we faced on firsthand and we realized that this is going to be the reality, right? Because in the last few years, we already have seen that modeling is now not easy. I'm sorry, not tough.
02:10:31
Speaker
I can go from a data to a model very, very easily right now. There are a bunch of tools, there are a bunch of AML. Now there is going to be a bunch of AI data science agents. where you simply give data, the agent will give you the model or a human will give you the model.
02:10:46
Speaker
But if you can't explain and solve these problems of risk and alignment, you can't use them in very serious serious scenarios. So this is where we will now see a new evolution of ecosystem, right?
02:10:59
Speaker
Like how you have seen evolution of ecosystem probably in 2000s. There is one class of people who is publishing software. There is one class of people who is auditing, managing the risk, doing cyber risk, for example.
02:11:13
Speaker
So this class of auditing slash risk management ecosystem is now started emerging. That's what we are solving for the past two years. ah Because of being an early player, we had the advantage of playing with large models, users, and all that stuff.
02:11:30
Speaker
So but recently, what we did was we released a new technique called backtrace. So what it does is, you know, so a neural network, you will pass the data and each perceptron or neuron, it gets activated, right? As I said, ah what Bactrase does is it will fix up your output and tries to calculate relevance of each neuron by analyzing this activations across each node. ah I can scale this concept even to an LLM today. So we applied that explainability to an LLM to mixture of experts kind of models,
02:12:05
Speaker
a large set of models. We have created this as a new R&D slash product group called RAXAI. So we have this traditional RAI.AI, which is the pass as a service, commercialization around solutioning of AI.
02:12:20
Speaker
And then this... as ah The traditional ARIA.ai would be, for example, customer service for banks or ah processing credit card applications, insurance claims, those kinds of things. like Like banks would be automating their routine workflows with the traditional ARIA.ai.
02:12:40
Speaker
So it's like a co-pilot for the, like say, the credit manager has a co-pilot which helps him to take better credit decisions. Yeah. Yeah, so not copilot, but actually an under-ed, actually an expert, for example. So the job is to replace an under-ed, right? The job is to replace a fraud detection systems like that.
02:12:58
Speaker
In some cases, we are copilets. Like in fraud detection, we amplify the ability to identify fraud. But ARIA.AI is primarily around solutioning and commercializing AI. ARIA.AI is primarily around the explainability and alignment, interpretability and alignment of ai which is a larger problem, more universal kind of problem.
02:13:18
Speaker
ah So that is no longer vertical, like ARIA XI is more horizontal. And that is not like a bank wouldn't be buying ARIA XI, but maybe somebody who's serving a bank, like AI vendor to a bank, like say an Infosys might want to buy ARIA XI. ARIA XI, correct. Okay.
02:13:37
Speaker
Anybody who is building a solution for mission critical, they would want to use RAC as the inferencing stack. right guard good ah So now the next the next big bet is ah by doing interpretability, by understanding what's happening at each neuron, can I modify the behavior or can I do a bunch of things?
02:13:55
Speaker
ah So our early research, which we are going to publish or other people also have published, ah is I talked about localization in human brain, right? So our brain intrinsically localizes certain areas for certain things.
02:14:08
Speaker
ah So in LLMs or in the very large deep learning models, this is exactly the case. Certain neurons are extremely important for certain things. For example, there was a paper published which observed that certain neurons are extremely important from security standpoint.
02:14:26
Speaker
If I tweak those neurons anything, then it becomes easily hackable or easily you know ah attackable, for example. ah And certain neurons are extremely responsible for biases.
02:14:40
Speaker
The gender bias, ethical bias, for example, when the model learns, it is observed that certain neurons are responsible for this. um So this is what our next big research area is. By doing interpretability and this, can we do alignment intrinsically inside the model and introduce new techniques around alignment such that let OpenAI or people like OpenAI publish models We ensure these are interpretable.
02:15:06
Speaker
We ensure these are aligned such that for the end user, they get the advantage of solution, right? ah Like if you look at the regulations or risk or or the or the value, everything is at the point of source, which is the guy who's building the application.
02:15:22
Speaker
The regulations applies on them. The risk is on them. The value is also on them. Not the guy who is providing the models, right? Not the guy, not the opening AI. It's eventually the guy who uses opening AI and build a solution and creates this where the actual value is or the actual risk is or the actual regulations.
02:15:38
Speaker
So that's where we are going to play as part of our XCI. um This is a very interesting problem. Very few people across the world focuses on this problem because it's a very hard problem. Anthropik has a mechanistic interpretability team, which is doing similar kind of work. Their approach is a little different than ours. then ah Within OpenAI, there is no department like that anymore. Google has a certain team right now.
02:16:04
Speaker
There are a few labs in Europe who is doing this kind of work. Now, we believe this is a big area, right? As I said, so modeling is one thing. Model acceptance is a a whole new game.
02:16:16
Speaker
which is where RAXCI is going to play. so we had even know So we even launched a new research lab in Paris, exactly for that reason, because Europe has very good R&D talent around these kind of problems.
02:16:31
Speaker
Our idea is we would want to become one of the most important labs around these two topics in the next one year. So that's the goal that we have. ah are you ah like a profitable business, the ARIA business, or and are you a lab? Because when you say a lab, it is essentially like ah something which needs investment, not yet profitable.
02:16:54
Speaker
So you can think of ARIA.AI and ARIA XEI like something like DeepSeek, right? So DeepSeek is a lab for them. Whereas the commercial is is their hedging slash, you know, the all-go trading business, right?
02:17:07
Speaker
So, RN.AI is a profitable business. So, there we commercialize and productize ai for a bunch of players in financial services space. What revenue do you do? Sorry? What is your ARR?
02:17:20
Speaker
annualized revenue? i mean It's not a public number, but I can tell you that we EBITDA positive for the past three years. We have grown 3x in the past three years. We recently got acquired as well, which gives us more firepower to play with. So, RA.ai is are bread-making business. It generates revenue, which generates ah is it customers.
02:17:42
Speaker
Is like 50 crore to 100 crore revenue range? Yeah, around that numbers. ah and And we will be more than that in the next ah one year for sure. ah Because there is enough demand there for sure, right?
02:17:55
Speaker
REXAI is an early stage startup from that point of view. So we are we are very early in that point of view. We recently got... and know product We recently started productizing even though we have implemented it as part of RA.AI.
02:18:08
Speaker
So this is the other advantage that we have, right? So we have direct access to customers. We are using deep learning and coupled with RA XCI to deliver the promise of the statements that we are making in RA XCI. In that way, we are very very we have a very good edge.
02:18:24
Speaker
ah Like we are not purely experimental. We are experimenting and also optimizing the product. But the largest value that we are going to see is when we solve this problem of alignment, which will probably in the next six to 12 months.
02:18:37
Speaker
When we launch product around that is when you could imagine us launching something like you know a hugging face for high mission critical kind of platforms.
02:18:47
Speaker
an AI inferencing platform where you can bring in the model, but we will handle everything that is required for a mission critical area, for example. So that could be a new thing, but that's at least nine months, 12 months away for us.
02:19:01
Speaker
Right now, the focus for us is on the R&D part. As I said, this is a very, very important problem and a very complex problem as well. When you're a bit positive, why did you choose to get acquired?
02:19:12
Speaker
Why not continue to build independently? Yeah, so there are two things when you grow business, right? So so one is growing business, growing product. ah So we were able to grow business because we are small.
02:19:25
Speaker
But that doesn't mean that that gives a guarantee of growing in the same cycle for the next, let's say, four or five years. Like growing to, let's say, $10 million business is extremely different comparing to $10 million to $1 billion dollar business.
02:19:38
Speaker
10 million to $1 billion dollar businesses where you need larger processes, your focus moves from product to scale, right? Scale is where you look at. Product actually takes very less of a backseat many times, pretty much 100% of the times product is a backseat.
02:19:53
Speaker
R&D is a backseat. So acquisition is a good route from that point of view, right? So Orient Pro, our our acquisition partner, already has massive distribution access in the financial sector space, which means we don't need to do the selling.
02:20:07
Speaker
So the selling is already done. We simply upscale the product. Orient Pro is like an IT services company for financial, for BFSI? It's an enterprise product company yeah out of India. ah And they are a public company as well. So they have era more than $100 million dollars year on year.
02:20:25
Speaker
And they're doing quite well, very, very profitable. um But they're a product company, unlike many other services companies that are there from India. Oh, okay. So this is that fast this a nice nice combination, right? so And that's why we are now focusing on a REX here.
02:20:40
Speaker
Or else we would simply have to focus on scaling REI to multiple markets, deploying it and all that stuff. We will do it, but yeah. yeah I mean, you can piggyback on the Audion Pro ah sales infrastructure because they would already have the infrastructure to sell to BFSI. Okay.
02:20:59
Speaker
Got it. Interesting. Correct. Okay. Interesting. Fascinating. Yeah. So this this is the next phase for AI for sure, as I said. So one is we will see you know migration of problem statements. We will also see the maturity in enterprise architecture.
02:21:12
Speaker
oh you will see pretty much what's happened in software in the last 20 years, right? Like from 2000s, for example, ah you only had application, then you had application monitoring components, you had application security components, and then you have cloud component,
02:21:28
Speaker
ah How you have seen software maturing and a dollar investment spreading out. That's all we will see for AI as well. ah That is also what happening, right? The investment around the models is kind of coming down.
02:21:40
Speaker
The investment around the models is going up, which is the solutioning, security, and all those kind of things. um Yeah, so this this this will be a great opportunity. And we are highly positive than in the previous cycles.
02:21:53
Speaker
ah Because, as I said, there is more data, more compute, more number of use cases than ever. So, which means there is a larger opportunity to commercialize as well. ah Let me end with this one question.
02:22:06
Speaker
There is this whole class of startups which are putting a wrapper around Yeah. um yeah it yeah and I mean, some people are critically calling it wrapper around AI. Maybe it's not necessarily that.
02:22:21
Speaker
But say, for example, i as a podcaster, use a tool which takes a podcast recording and creates an article out of it. So like we had a conversation, I'll put this conversation into a tool and it'll give me a nice 1500 word dog.
02:22:35
Speaker
ah Do these ah kind of tools have a right to exist? Would the ah fundamental models eventually be good enough to that you just need a subscription to a Gemini, for example, and it does everything for you?
02:22:50
Speaker
Yeah. No, i mean, let's go back to 2010s again, right? So or there were a bunch of models, which is AlexNet, Clarify, these are the models that came into the market.
02:23:01
Speaker
ah But the commercial value was always on the end applications, like using it for image labeling, SQ automation, image description kind of problems, and in enterprise problems like ticket re-routing and all that stuff, and customer gate bots, for example, right?
02:23:19
Speaker
So the end applications is going to increase their ah increase their value proposition. ah but What's happening on the ground models, for example, like whatever has happened at the LLM, eventually things like your Intercom, for example, is a good example.
02:23:37
Speaker
ah Intercom is being classically the customer care software. Now it's automating the customer care. Next, it will fully autonomously run the customer care, right? that end application value has exponentially gone up, is going up from time to time, right?
02:23:54
Speaker
there There is a clear value. So if you're already in that business, your ability to multifold, that is now the best opportunity that you could have.
02:24:05
Speaker
It all depends upon how quickly you can add those features and components such that you can become a big player. So this is why the enterprise AI is the biggest opportunity right now. ah for two reasons.
02:24:17
Speaker
One, for disruption, for of course. Second, for the current stickiness of the enterprise applications that you have. ah Most of the current end enterprise applications are legacy. Almost 80% is legacy. 20% is fresh, which has happened in the last 10, 15 years.
02:24:33
Speaker
Even they are now legacy because not all of them are API heavy and all that stuff, right? ah So whoever is going to ah deliver that value in that end application is going to win that game.
02:24:45
Speaker
That's smart. Like even the podcast thing, for example. So podcast creation on organization is a service. ah Summarization and everything is additional value at the service can deliver. right This will stay true.
02:24:59
Speaker
That time is gone up for these end applications. Hence, you are betting on acquiring more ah more of that increased time if you're already in this player.
02:25:10
Speaker
Or if you're a new player, your proposition is, I'm going to disrupt because I'll do things faster than anyone else. This is what's happening in sales side of things right now. right Because of the agent space, the classic CRM tools which have been there, like Salesforce, HubSpot, for example, are now migrating towards to agent-based sales CRMs.
02:25:32
Speaker
ah There are people like 11X, not 11X, and then there are two, three other players in US who are agent-heavy CRM systems. They said, I don't want this 100 screens to do the job.
02:25:46
Speaker
i'll do it I'll do this through agent. um yeah now a use AI is the new UI, basically.

AI's Influence on Interfaces and SaaS

02:25:52
Speaker
AI is the new UI. Correct. Exactly. So that is happening there. Correct. ah But, and now you have, you you see a lot of intermediary players now.
02:26:02
Speaker
So there is a clear value there. There is a win there for sure. ah And then there is a tooling game. Like um I'm taking both extremes. our Tooling game is extremely hard, extremely tough.
02:26:13
Speaker
ah the mod is decreasing day by day, tooling slash platforms, like, for example, cloud platforms, right? So people are okay to use things like Together.ai, Nibius, or RunPod, for example, to run their ah GPU experimentations, not just using AWS.
02:26:31
Speaker
In some cases, enterprises are also okay as long as there is a data guarantee and there is closeness and all that stuff, right? So the data center and the cloud component is getting disrupted very very quickly also on the platform level.
02:26:45
Speaker
ah Platforms like DataRobot, Databricks, which have been classic ML Ops, they migrate towards being a CRM systems, which is, you know, our prediction is most of the software will become point of records, right?
02:26:58
Speaker
They capture the point of records and action happens peripherals. This is what Salesforce have become, right? Even though it started a CRM 20 years back, 25 years back, ah but currently if you look at CRM, you will do very less things on CRM. You will do things more around the peripherals.
02:27:13
Speaker
ah You have calling system, which is integrated to CRM, which does the calling at the end function. You are simply sending the record there. So in tooling, that opportunity is now emerging, which is where RAXAI is also into, right?
02:27:24
Speaker
ah We are not replacing your current base MLOps platform. we are We are adding into it. ah This is now becoming interesting because things like compute optimization, security, data distribution, compute distribution, for example, ah all these things are very interesting because 1% saving in value is close to $10 million dollars for a big but ah big business.
02:27:45
Speaker
right ah There, it's a very technical driven game, very high technical ah mode. There, if you have, if you are from the background, if you have done that, dig that, and you are migrating into the business, that you have a better value of success.
02:28:00
Speaker
Because you know what delta you want to add in the tool part of it. It's tough to win, but if you win, you will become a long-term player. ah And then you have a foundation models, right? Which is OpenAI, which on on the tools part, for example, OpenAI and all those guys, foundation models. And then you have foundational training data. Let's call them as the training data providers, for example, foundational training data providers.
02:28:22
Speaker
They had a large amount of success in the past three years, like scale.ai, for example. ah There was a couple of other companies in US s who suddenly are doing $200 million ARR from a $5 million ARR in the ah four or five years back because all these large companies need a lot of training data, right?
02:28:41
Speaker
Tending data, labeling, corrections, all that stuff from images, videos, they will continue to grow. They will continue to grow for the next five years for sure because ah you have new type of data. Reasoning is a new problem.
02:28:54
Speaker
They will always say, i will now i will now pro provide PhDs on a payroll, not just plus two guys on a payroll right now. um They will provide expertise because to evaluate reasoning accuracies and all that stuff, right?
02:29:07
Speaker
The training data providers who already have been working with them, they are growing very, very massively. ah Now the foundation model story is kind of shrinking, right? Shrinking and expanding, shrinking and expanding.
02:29:18
Speaker
It became like a sponge. It shrunk to only one or two players about three, four months back and now suddenly opened up because of DeepSeek, right? Suddenly opened up saying that, no, no, no, you can, even small players can become that foundation model providers. Right?
02:29:34
Speaker
um So ah you ah everybody can always build foundation models or SLMs as well. People are even pre-training the models internally inside an organization as well. I've seen the cases to that extent also.
02:29:47
Speaker
But there is always a value of OpenAI. Nobody can replace that. So you will have OpenAI, Gemini, Anthropic, maybe three or four or three. And we'll consolidate to that. And then you have a bunch of small ah SLMs playing in either saying build your own SLM or build on top of my SLM to some extent, for example, or a reasoning model or an agent calling model.
02:30:07
Speaker
ah You see a bunch of new opportunities in playing in this area as well. Here you can have a game of more than one player win kind of strategy. Here you could see a bunch of startups who already raised, let's say, $20 million, $50 million, dollars ah trying to win this game. They can't win this game because nobody is now funding them that anymore.
02:30:27
Speaker
Even Mistral probably don't do that. They'll probably do this part of it. So investment is kind of diversified, saying that, okay, you are building an, let's say, LLM for robots. Okay, I'll give you $50 million. you are building a large model for agent calling. Okay, I'll give you 10 million dollars um because there is a value in those functions. The idea is, let's say a billion call happens, you will get a share of the billion calls and then you will get profitable that.
02:30:53
Speaker
ah Next is a mid-layer solution, right? now Like things like RAG and service, for example, Q&A on documents, for example. These are the biggest risk because for all the large models or small models, they'll always go into solutioning to try to extrapolate their value.
02:31:11
Speaker
This is why OpenAI is the biggest competitor to do every other player in this segment and has been disrupting it from time and time and time. ah Like, you know, two years back, RAG as a service was a big proposition. Like people were saying that I'll use OpenAI and build Q&A on top of it and that it's entirely vanished.
02:31:30
Speaker
In six, nine months, entirely vanished. ah People have done RAG, entirely vanished. They have scaled only in the enterprise segments, for example. ah There, it's a very tough game. It's a very, very tough game, ah particularly if the mode is very small. If the mode is only using an application and create that end application ah to enhance it, people will say, the m the end solution guys or software guys, they simply say, I'll rather buy from OpenAI and build a model or do that service as a service or build it internally, for example.
02:32:01
Speaker
Here is where you would see a bunch of SaaS players playing trying to play in that game, which is what the biggest threat is. The traditional SaaS always used to be one-up as a service, right?
02:32:13
Speaker
ah Like, you know, if you look at last five, 10 years of SaaS evolution, um there is CRM, there is data cleaning, data augmentation, or conversation systems, for example.
02:32:25
Speaker
Each independent one-up was a SaaS product. and people were able to create a portfolio of, let's say, 10 SaaS products to to serve a function.
02:32:36
Speaker
That's not there anymore. That won't be there anymore in AI for sure. There will be a lot of consolidation, which one try to get into another, one try to get into another, ah one way or the other. So the mode is quite hard to defend in that case.
02:32:49
Speaker
You may not have 10 SaaS opportunities. You may have three, four SaaS opportunities, ah three, four SaaS companies playing in that game, right? This is the biggest pain point in the industry right now. So people don't know where they are playing, what is their mode. Hence, they try to get disrupted or they try to get a huge amount of competition suddenly and they are not able to scale.
02:33:10
Speaker
Or they just simply consult it and become a part of another product as well. This is what is going to happen. So all these layers is trying to mature. And this layer of as micro SaaS play, it will not be there anymore.
02:33:25
Speaker
Unlike in a software SaaS, it used to be that. But this SaaS is definitely dead. This is what, when people say SaaS is dead, this is what is largely kind of but going to die. Fascinating.
02:33:36
Speaker
Thank you so much for your time, Vinay. Sure. Awesome. Thanks, Asha. See you later.