Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Why Machines Learn: The Math Behind AI image

Why Machines Learn: The Math Behind AI

Breaking Math Podcast
Avatar
4.4k Plays4 months ago

In this episode Autumn and Anil Ananthaswamy discuss the inspiration behind his book “Why Machines Learn” and the importance of understanding the math behind machine learning. He explains that the book aims to convey the beauty and essential concepts of machine learning through storytelling, history, sociology, and mathematics. Anil emphasizes the need for society to become gatekeepers of AI by understanding the mathematical basis of machine learning. He also explores the history of machine learning, including the development of neural networks, support vector machines, and kernel methods. Anil highlights the significance of the backpropagation algorithm and the universal approximation theorem in the resurgence of neural networks.

Keywords: machine learning, math, inspiration, storytelling, history, sociology, gatekeepers, neural networks, support vector machines, kernel methods, backpropagation algorithm, universal approximation theorem, AI, ML, physics, mathematics, science

You can find Anil Ananthaswamy on Twitter @anilananth and his new book “Why Machines Learn

Subscribe to Breaking Math wherever you get your podcasts.

Become a patron of Breaking Math for as little as a buck a month

Follow Breaking Math on Twitter, Instagram, LinkedIn, Website, YouTube, TikTok

Follow Autumn on Twitter and Instagram

Follow Gabe on Twitter.

Become a guest here

email: [email protected]

Recommended
Transcript

Podcast Introduction

00:00:10
Speaker
Welcome to Breaking Math, the podcast where we can freely question why machines learn. And we get to learn more about the mathematics behind your favorite AI tools like chat, GPT, Sora, and Gemini.

Introducing Anil Ananthaswamy

00:00:22
Speaker
Today, we are joined by Anil Ananthaswamy. He's an award-winning science writer and former staff writer and deputy newsletter for the London-based New Scientist magazine. He is the 2019-2020 MIT Night Science Journalist Fellow.
00:00:40
Speaker
He writes for Qantas, Scientific American, Nature, New Scientists, and many other publications. His first book, The Edge of Physics, was voted book of the year in 2010 by UK's Physics World. And his second book, The Man Who Wasn't There, was long listed for the 2016 Penn E. O. Wilson Literary Science Writing Award. His third book, Through Two Doors at Ones, was named one of Forbes 2018 best books about astronomy, physics, and mathematics. His new book, Why Machines Learn, the Elegant Math Behind Modern AI, was just published today. Anil trained as an electronics and computer engineer. at ITT Madras for his bachelor's degree, and the University of Washington, Seattle for a master's in science and electrical engineering, and later worked as a distributed systems software engineer and architect before switching to writing. He studied science journalism at UC Santa Cruz. Now, with Anil's impressive background, let's dive into this rich history of mathematics that drives machine learning.
00:01:55
Speaker
Hi, Andale, how are you doing today? hi calling I'm doing well, thank you. Thank you for. Thank you for coming on. Now, out of curiosity, how were where did you start with the book originally?

Anil's Journey and Inspiration

00:02:10
Speaker
What was the inspiration for it? I know that i Why Machines Learn has a fair bit of math in there. um Tell us about why you wanted to write a book that had a lot of math in it.
00:02:27
Speaker
So I've been a journalist for about 20 years. I used to be a software engineer before I became a journalist. And for a long time I was writing on topics like astrophysics, cosmology, neuroscience, practical physics. All the cool stuff. All the cool stuff, yes. But what was interesting is that when I would write about those topics, I didn't try and understand the science as best I could and I didn't write about it. I never had any illusions about being able to actually do that work. But over the last short of around 2016, 2017, I noticed that I was writing more and more stories about machine learning. I was just beginning to write about
00:03:10
Speaker
And I think every time I would interview the sources and start talking to them about what they were doing, I think the software engineer part of me woke up. It had been normal and for 15, 20 years. And so some part of me really wanted to learn how to sort of code and And in 2019, I got a fellowship at MIT, the Knight Science and Journalism Fellowship. And as part of that fellowship, I did a project learning how to code. so And when I could possibly literally run back to see a small ones after all these teenagers, I don't need life on boyington.
00:03:47
Speaker
It was ah intense. As I started coding, I realized that I really wanted to learn the basics of the G-learning, not just for, you know, learning. So I started just into the log-in through two lectures by professor and I said, one particular professor at Cornell was very selective on enjoying the subject. And it was when I was doing that, I realized that full mathematics underlying machine learning is so beautiful. And now the writer in you woke up and said, oh, I have to communicate this to read it. Precisely. So that's how it came about, Frank. It was a book that I would have liked to read myself when I was Franklin machinery, some foundation of storytelling, journalistic storytelling, history, sociology, and the nuts.
00:04:38
Speaker
So, that is really the kind of, if I'm not really lovely, now I'm going to do Delta the Cross. So hence the book, full of equations. i I think that that's crucial for anybody who either needs a refresher or you know you're switching fields, anything of that sorts. And sometimes even when you're really advanced in the field, you know we we need to take a couple of steps back and think about the story behind it.

Why Understanding AI Math Matters

00:05:10
Speaker
and have that appreciation for it. Because I know I'm an industrial engineer ah by by training um and I deal with a lot of robotics, but you know, sometimes when I was teaching a course before you, you kind of have to pick and choose what do you want to teach somebody for the simplicity? And I think that this book is, it's crucial for people to have, even if it's, you know, needing that refresher for stuff, it really highlights everything that you need to know.
00:05:45
Speaker
Of course, ah this is my subject to take based on a lot of research and our the conversations with the experts and what it worked. I found this the beautiful and essential part of machine learning. I'm sure someone else might have a slightly different take on work is crucial. You can only put so much into a book if already almost 400 and 50 days is long. And know so I have to do it solemn and crucial algorithms probably. but But I think my intent was to air in the conceptual arc of machine learning. And just tell them strictly, they're going to be in the ideas that underpin ML. And we'll turn on just focus on detailed art books because those are, these are the moment that has been so much of the one that has happened. So yeah, um I agree with you. that Then, then the book, I said,
00:06:40
Speaker
You know, as a refresher for someone who has a background in it, but also people who have maybe high school level or mathematics that you want to understand. And then at the next, that is more in order to get from reading magazines. popular side and Absolutely. Absolutely. and It's beyond that. Now, besides appreciating the beauty and elegance of some of the basic math behind machine learnings, what are really the other reasons for writing the book?
00:07:12
Speaker
um you know i yeah think so watching how AI is developing and watching the big companies that are kind of becoming the native keepers for this technology, I feel like we need a layer of society that, you know, apart from the people building these systems, we need a layer of society that really understands or it's happening at least a certain amount, at least a certain amount of arithmetic that we did because it's only when you look at the math behind in machine learning that you start understanding both the power that is inherent in these systems and also its limitations.
00:07:52
Speaker
Uh, so we need, you know, uh, we need communicators as to make it as a journalist, a policy makers, teachers. Um, and just for anyone in society who was willing to be engaged with this technology and the level of at least the basic lab. Yep. Uh, so we need society you to be on the keepers. Also, where does it even from the people who are building the systems that I think for me, that's really important. You know, the good way to lay them, the, the mathematical basis for this. I agree because so much ah decision making is made now through AI. Whether it's social media, you can run a quick code on chat GPT or Gemini or whatever you have available to you, right? Or someone builds an in-house algorithm for a company.
00:08:44
Speaker
you need to know what decisions are being made. And sometimes, ah you know, there's those very subtle, those algorithms may have implicit bias. you know You don't realize that until you're coded a few times. Yeah, absolutely. And I think when you you know when you understand and the mathematical basis for why these algorithms work, the reasons for the bias also become obvious. So you can understand where the bias comes from, whether it's from the data or whether it's from the choices you make of the algorithm, et cetera.
00:09:19
Speaker
and And the math can raise that in ways that just language is unable to any way we can. We can talk about it in language, but somehow the lab just needs its own piece. Yes. Now.

History and Evolution of Neural Networks

00:09:36
Speaker
You know, we talk about what's obvious and most people already know that AI is sometimes chat GPT, but it does have a history as well. Tell us a little bit more about that because people don't see the history and evolution of this.
00:09:53
Speaker
Yeah, I mean, the machine learning was back along the way. I think for most people now, artificial intelligence or AI is literally just chat GPT. Yeah, and you're a chat GPT is being used as a kind of case order for all kinds of large language models. you arent don't you learn But the history goes back, at least a certain kind of history, like ah for instance, the issue of neural networks. They go back to the late 1950s. So Frank Rosenblad was the first one to come up with let's a little bit and call it a perceptual algorithm.
00:10:28
Speaker
which is essentially a single layer neural network that could be trained to you know and do classification. For instance, instance if we talk and distinguish between patterns of two kinds of images, and the idea was that these images somehow occupying some coordinate space where one type of images were in one type of the one location in the coordinate space and the other images were in another portion of the coordinate space. And you could draw a straight line between the two classes of images. Then this algorithm was guaranteed to find that linear divide in finite time. And this was a really big deal in the late digraph, because of the hypothesis. And also, there was a brilliant withdrawal. So withdrawal with his 22 and a half came up in the least least squares algorithm, also for training a single neuron, a single layer neural network. And
00:11:23
Speaker
At that time, in the late 19th to early 1960s, that's how they could only train a single layer neural network. And the the neural network essentially was composed of artificial neurons. Artificial neurons are very, very usually inspired by biological neurons. I think it should be very clear that the term artificial neural networks somehow cease to suggest that these things are as complex as biological records, they're not. So the artificial neural is just a computational unit that takes in some inputs and does some computation on them. And then as I explained to produce an output protocol or what kind of output to produce. And the fellow artificial neural output is said to be off the network of these archives for a few months. And it's relevant to you actually within my logical neural network and what's happening again in a biological neural network.
00:12:12
Speaker
The cult-life city of that is so, so much greater. Nonetheless, these are artificial neural networks, and it's chill. you know It goes back to 1950s, early 1960s. But what happened at that time was that you know these were singular neural networks. People didn't know how to train any more complicated neural networks. you know on a single layer. We didn't know how to obtain them. And around that time, two other AI researchers, Narvan Yesti and Sengon Dapert, came up with a, it's a very, very beautiful flower diddling book called, you know, called, it's called the Receptuant in honor of Frank Pozen. But in that book, they kind of showed that the single layer of neural networks could not even solve but um a simple problem. It was called the XOR problem.
00:13:00
Speaker
where if the you know if the data that you had was of a kind where you couldn't draw a straight line ah through the coordinate states to separate two classes of data, then this neural network just single neural network just couldn't find the solution. They had mathematical truths for that. And they also finally suggested the key minute you The synesthetic with our crew here is a destiny that even if you have multiple layers in the neural network, it's still probably going to solve that. And that's why I put a huge damper on neural network research. People just kind of start thinking about it and wondering because they forgo this is our layer. So, pretty much, you know, towards the end of nineteenth sixty year in like the like even in the just fell off a cliff.
00:13:48
Speaker
ah But meanwhile, other algorithms work together, too. And we develop and, you know, neural machine legacy has to do with, you know, non-neural network based on the algorithms. Now, tell us a little bit more, while research on artificial neural networks did come to that big standstill, what were the other breakthroughs that were happening?

Key AI Breakthroughs

00:14:15
Speaker
Oh, I think for again, this is subjective. This is my take on what was happening very far. um The K nearest a drill with us was a conceptually brilliant piece of work that also came out of the 1960s play.
00:14:31
Speaker
false folder and feature arc for Stanford. Yeah, ah the cover are to have 10 years in problem. And that was, you know, conceptually, so, so single, like you basically, you know, think of your images as, um you know, each range of fluid related to vector. And the vectors are part of this of high dimensional space. So let's say, you know, images of cats in one side of the coordinate space and images of dogs in the other side of the coordinate space, then you get a new image. You color that into a vector and you kind of chart that in the same coordinate space and see what is it nearest to. Is it nearest to dogs or is it nearest to cats? And, you know, depending on that, because of any image as a cat or no.
00:15:14
Speaker
And of course, it sounds very simple when you say it like that, but the underlying math that tries to make a formal case for it was extremely complicated. But they managed to show that it's an algorithm that does really, really well. I mean, yeah, I mean, that was a big breakthrough and it was very powerful and was used for a long time. It rose into this problem called curse on the methodology as you start getting in the pile on higher dimensions, data starts beginning in a very strange way. Um, so, as to the data, the data were very big that came in, and again, not doing so well. So, in terms of, uh, you know, completely. Um. Yeah, and then, um.
00:15:58
Speaker
I think, and there was a lot of work to do with, the you know, the naive Bayes classifiers or common Bayes classifiers. So that's also significant to bring like in my book, where I'm getting the reader to appreciate the probabilistic aspects of machine learning. You know, we seem to think that when machines make a prediction, that they're certain about what is word the prediction is, but actually under the code, a lot of these things have uncertainties. Uh, you know, so there's a certain amount of certainty and a prediction that can be quantified or the algorithms are explicitly probabilistic. And this has you choosing the most likely answer to use in very possible that the least slightly less likely answer is the correct one.
00:16:44
Speaker
big ph losty im search so you know So understanding these kinds of absurdities and probabilistic aspects of machine learning know that also, and know from the mathematical perspective, it makes a big difference in appreciating what's going on. probably, for me, the most and sort of influential and important algorithm prior to the neural networks was support vector machines. So I was saying your word was the we find a linear divide between two classes of the data. Now, if you imagine and ah two clusters of data in coordinate space, the one on the right side and one on the left side, and and there's a linear of the output coming in.
00:17:26
Speaker
In Bristol you can draw an infinity of straight lines through that, and you know, through that gap. It can be a straight line in, you know, or a plane or a hyperplane or whatever, depending on the diversity of your data. Or in Bristol you can entire the infinity of them and the preset card does not find the most optimal one. It just finds any, ah you know, divide in between the two plus and the other. So for director machines, We're the first ones to find what's called an optimal margin classifier. It basically finds the linear divide between them plus what we've done. That is as good as possible, or any old separation made into them. And then the other thing, I was telling you about the linear rationality problem. One of the wonderful things that support vectorization is also the way they use these things called kernel methods.
00:18:18
Speaker
And kernel methods are just this really, really amazing ah now a sort of a mathematical trick where you take a linear standard that way, cannot find a linear divide. because the data is so mixed up that you have to draw some really convoluted curve to separate two classes of the cluster data classes of data, whatever. And ah so the idea would be to project this data into really high dimensions, possibly even infinite dimensions. And that even, you know, it boggles your mind to think what infinite dimensions could be. But in order for support mechanisms to work, they had to build dot products of
00:18:57
Speaker
vectors representing the data points and your new data that you're dealing with. Now, if you're going into very, very high dimensions, or even infinite dimensions, storing these vectors, or doing the dark product is completely really expensive. So what curling others do is they find the function that allows you to take two vectors, And that function produces the dark product, not of the vectors in the lower dimensions, but the dark particles, the vectors in the higher dimensions. So you edit the dark, you basically, let's say there's a function K of two vectors, X and Y.
00:19:34
Speaker
then the function k gives you the dot product of x and y, not in the lower dimensions, but as if these vectors were connected into the higher dimensions. So you basically are able to do computations in the higher dimensional space without actually ever converting your vectors and stepping into the higher dimensional space. So you computationally are doing your work in the lower dimensional space, but the effective results from getting represents for accepting the higher dimensional space. So in the right dimensional space, you can find the linear divide. And each and every one of these year classifiers are beautiful. So you walk in an identical space, get in the linear way, and then you project back into the lower dimension. And in the lower dimension, the straight line looks like a very convoluted curve. So you essentially ended up finding a non-linear boundary between two classes and beta.
00:20:22
Speaker
ah because you project to your legal outcome to hide the initial space from this, you know, linear classifier, and then it will look to you back. ah But computationally, you're always working in an order-reaching space. And that, I think, is just amazing. That, that actually is. Because you wouldn't think of that if you're, I don't know, like taking the probability of something and just writing it out mathematically, so simplistically, but ah How do I put it? It's the skip the middle fluff of the problem and you just get that output so it saves a lot of computational time.
00:21:04
Speaker
And it's quite a craft and art to design these kernels. There are lots and lots of different kinds of kernels, and they become more and more graphically sophisticated. So it's a separate field in itself, so you kernels, what kinds of things do you project into the hierarchical space, or do the features in the hierarchical space that we're working with? And yeah, so that's a, there are test results that can drive things by itself, so. and Exactly. let's What's really so special about support factor machines and current methods?
00:21:39
Speaker
um ah i think the The specialties that support rank initiatives find are called optimal margins. Like if you have two classes of data and you want to find a divide between those two, you want to find a straight line or a curve that separates those two classes of data. So then when you get a new class of data and you want to figure, does it develop from class A or class B? ah You want to be able to do that as correctly as possible. you know You'll always make some mistakes, but if you hit thes the dividing line that you found between the two plus of data is optimal, then it will take you to the right when you have plus the twice unseen data. It is lower.
00:22:21
Speaker
I mean, there'll always be mistakes, but you're basically lowering your probability of making mistakes. So that's support recognition. But support recognition is flying a linear divide between the dots that they've got. And they work in any dimensional stage, but if you keep going into higher and higher dimensions, then, ah you know, it gets computationally tractable. So, that's for kernel methods. The combination of support vector machines and kernel methods, what kernel methods do, like I just said, is they will project the data into high dimensions, find the linear divided high dimensions, where the computation is all happening in the lower dimensional space.
00:22:58
Speaker
So, usually you're finding some straight line in, I don't know, 10,000 dimensions, 100,000 dimensions. It doesn't matter what it is. It could be even infinite dimensions. In having infinite dimensions, you always find a straight line to separate two classes of data. But when you project that back into, you know, whatever dimensions, you're starting with 100 dimensions or 20 dimensions, you will, that straight line turns into a very nonlinear curve. And so the combination of support vector machines and this ability to find a straight line in higher dimensions without comp computing any higher dimensions, that's the power of support vector machines. And it may have been used for all sorts of classification problems prior to neural networks becoming unique. Okay. Awesome.
00:23:44
Speaker
Now, as we see machine learning progressing on multiple fronts, yeah where do artificial neural networks start coming back into the fray? what When did this happen and why? so so initial year allowed ah and when ski dupper I would have made this case against single-layer networks saying that they couldn't even install the XR problem. There were a lot of research on that in that field, but there were a few. There were a few people who persisted. Again, i in my look, and you know I take a few examples of the kinds of work that kept going on.
00:24:28
Speaker
So, one big development happened in the early 1980s. John Hockfield developed something called Hockfield Networks. Hockfield Networks are essentially, again, the a certain k kind of architecture of these are usually runs. They are connected directly. So, you know, normally on your network, you would have the input coming in from one side. goes into one layer of neurons, that layer of neurons produces an output, and then goes into a subsequent layer, and it keeps going in one direction. These are called feed-forward networks. So the output of the neuron never feeds back to itself or to the neuron's central area before it's, you know... So those are the standard things that we see today in these three forward networks. John Hoffey developed a slightly different architecture where you could have recurrent connections where neurons output could feed back into other neurons in its own layer. So you did this, you know, great electrical neural network and he showed how
00:25:33
Speaker
you could store memories in in these networks. So what you do is you essentially figure out how to store an image in such a network and the basic alien data is back. If you can adjust the weights of the network sunset When you store your image, the network has what is called a minimal energy. And there's a way of defining more the energy of a neural output means in this context. But the math shows you that if the image ah is the output of the of an image, let's say there are one with neurons, right? And each one of those neurons is outputting something, and you take that as the value of a pixel, so you've got 100kT, an extra 10 by 10 image.
00:26:18
Speaker
Right? Yes. So you should explore, like you mentioned, in this particular whole field network. Then that shows that that network is basically the head. It's the newest energy state. And now what you do is you perturb the network. You change some of the things of that. So starting again, the image is corrupted. And you've got all sorts of noise in it. But the dynamic dynamics of the network, is so learn it it wants to go back with the low 76. and And so if you perturb the network, it means you have perturbed the memory of the image.
00:26:51
Speaker
when The dynamics of the networks are such that it will just basically all the neurons will start firing in in a response to the nearby neurons. And eventually they will all settle back to this state that represents the minimal energy state. And now when you read off the output of the neurons, you're going to get back the original image because that image represents the low energy state or the lowest energy state. So that was a very big deal in 1982. Yeah, and John Hoffman came up with that. It's not depending on the inputs that we used to integrate, it was like the, you know, the beginning is all of the return on the other outputs.
00:27:25
Speaker
And probably, again, and you know, there are other and things that happened, but I think the real key development was in 1986 when Bloomberg leaked the back propagation algorithm.

Rise of Neural Networks in Modern AI

00:27:40
Speaker
This was finally, um I mean, I have to make sure that we get the history, right, in the sense that this algorithm is often attributed to rule of art and intelligence, but ah The idea of backpropagation was all the way back to Frank Rosenblatt in the 1950s early 1960s. He knew that in order to train multilayer neural networks, which have more than one layer, you need some kind of algorithm. He actually coded backpropagation in his book, um but did not know how to do it. And or from the 1960s all the way to the middle mid-1980s, there were various people who
00:28:20
Speaker
attempted to solve this problem in different ways, not necessarily for neural networks. There was stuff that was going on in control systems, in rocketry, in economics, you know. So these ideas were all there in different forms, but it was the Ruhan-Harkington-Weir's paper that clearly showed how to do this for DQL networks. But a detio simply means that the neural network has and one layer that is sandwiched between the input and output layer. So it's hidden. It's hidden from the input side and it's hidden from the output side. So any neural network which has at least one neural layer is technically a detial network. And that requires the macrocovision algorithm to train it. And so that was a big big deal in 1956.
00:29:05
Speaker
And maybe, again, I'm not subjected, but 1989 was the so-called universal approximation here. There is one thing to say that you can train a deep neural network to do something, but it's a big other thing to automatically prove that ah you know there is no limit towards a neural network. That was where universal approximation became a recommendation. So there's one way to think about it. i welcome youro and what doing they're essentially you have some music text
00:29:38
Speaker
and desired output y. The neural network has to find the function f, like that y is equal to f of x. And that function can be anything. It depends on what kind of correlation exists between your input and output. And that correlation exists in the data. So the training What data is are we doing? basically What are some form of pairs of input and output we doing? that are providing the neural network.
00:30:05
Speaker
You don't know how the input and output are connected, what kind of function maps the input to the output. And that fortunately, we were like, what? It's like, yeah, instead of that, that's kind of what the neural network has. The neural network algorithm kind of figures out where the function f is that maps, you know, y is equal to f of x. wait Now what the universal transformation theorem shows is that if you have a single hidden neural network, at least a single hidden neural network, then given enough artificial neurons, which means given enough computation, a neural network who will find any, it will approximate almost any function that is necessary to wrap the input together. So basically it's a
00:30:49
Speaker
well Again, given enough our digital layer, of course, in principle, in practice, in principle, that's possible, but in practice, you never have a good enough, you know, which will always be cross-traditional real function, but that's what the theorem shows very clearly, that it is a neural network is more totally universal function of course, but clearly, it will approximate any function in order to out-enblet that function. And then those these things are really big developments through the 1980s. Right. And I think that's certainly the state for, yeah you know, further research because tiny needles look at these things actually can know amazing things. It's certainly not a gravitational power in this stage to do not with it. but
00:31:30
Speaker
but pretty sure floating now we knew that These advances weren't enough. Support vector machines dominated through the 1990s and early 2000s. What changed? I think in order for neural networks to become what they've become today, a few things had to change. One was the necessary for really large data sets. So neural networks are extremely data hungry and they need a lot of either human annotated data or you know just basically a lot of data. and that
00:32:05
Speaker
didn't happen until the the end of the first decade, decade of the 2000s. So we had the ImageNet and sort of dataset that was one of the first real large scale datasets for image recognition. And then and the other thing that had to change was these are neural networks are very, you know, they do a lot of matrix manipulations and At some point, people figured out that the GPUs that were really being used for video games, ideal sort of processors for doing a lot of these matrix manipulations. So it was a combination of the availability of large data sets, the back propagation algorithm to train these networks, and the use of GPUs.
00:32:49
Speaker
that finally sort of brought neural networks to the fore. I think they they became really powerful and that's when the first neural network based image recognition systems, which was AlexNet in 2011, that broke through and made such a big splash. ah So after that kind of neural networks have pushed aside support vector machines. um And yeah, it really took all these elements, like large availability, you know, availability of large data sets because of the internet, the use of GPUs and backpropagation, all of that coming together that made the difference. Absolutely. Now, your book ends with a look at how deep neural networks and challenging standard ideas of how machine learning works. Now, tell us a little bit about that.

Challenges in Neural Network Theory

00:33:39
Speaker
Yeah, so standard machine learning theory, ah you know, one of the things it says is that if your model is over parameterized, that there are a lot of different knobs that you can tune in order to fit your data. And if if there are way too many parameters, then you'll essentially end up overfitting the data. You will start kind of, you don't want to overfit your data that well because there might be noise in the data. And when you overfit, you end up fitting the noise also. And so when you start then testing your model on new data, which might have different kinds of noise, you're you're not going to do well on the new data that you haven't seen. So if you're trying to make predictions about unseen data, but you have overfit your model to the training data in such a way that you're picking up all the you know nuances that exist in the training data, then when you see new data, you're going to make lots of mistakes.
00:34:30
Speaker
and that's So basically, traditional machine learning theory says that you have to find the right balance between underfitting and overfitting. You have to be somewhere in the middle, the Goldilocks zone. um and and But the very intriguing part about neural networks is that they are actually extremely overparametrized. They have way too many parameters. and And yet they don't seem to overfit the data. They seem to generalize well to unseen as long as the data is drawn from the same underlying distribution. So there's something going on that we don't fully understand and standard machine learning theory is unable to explain why
00:35:06
Speaker
um neural networks don't overfit and they generalize despite having enormous numbers of parameters. I mean, when you think of modern than neural networks, but we're talking of half a trillion parameters, even a trillion parameters. that's Those are massive. And of course, the data is also very massive, but nonetheless, there's something about the way these networks are functioning that still hasn't been theoretically well explained. Sure. and so So the kind of book ends at that point where we point out that we have entered a new, what one of the researchers I spoke to called Terra incognita. Basically, this is an unknown terrain as far as machine learning is concerned. We have a lot of empirical evidence about how these networks behave, but we don't necessarily have a good theoretical understanding of what's happening.
00:35:54
Speaker
Now, I know that there's always been this larger question of, are machines conscious? Is it computational? I know that you've given talks on that sort of stuff before, and maybe I'll leave this also as a teaser for future work from you, right? So you talked a little bit in the book about an example of for folks when you gave a talk on like, are our machines really conscious? Will they ever get to that level? This is a really tough and hard question, not because I don't want to answer it, but I think it really depends on what one means by consciousness and what it might be.

Machine Consciousness Debate

00:36:36
Speaker
so there are certain so yeah So there are certain ways of thinking about what consciousness is. If you take those definitions, then machines are not going to be conscious because
00:36:46
Speaker
There are people who will argue that consciousness is really a property of biological systems and the way biological systems are organized and that consciousness has properties that cannot be explained by the physics of the material world and so on. So that's a particular way of thinking about consciousness. If that's the case, it's unlikely that machines are going to be conscious anytime in the near future. But there are other ways of thinking about consciousness where where where um It's very likely that, you know, we'll end up with machines that are conscious in the way that biological systems are conscious. They may not be conscious of all of the stuff that we are conscious of, but machines will be conscious of what they are conscious of. So it really hinges on definitions of consciousness and what we mean by it. So if you take a purely materialistic perspective on what consciousness might be, that is just an emergent property of the way
00:37:36
Speaker
matter is organized. you know And in this case, the matter that comprises us, the biological system, and we are biological machines. Then in principle, I mean, I'm not saying this is practical or that it will happen anytime soon, but in principle, ah yeah, it's entirely possible that machines will be conscious.
00:37:59
Speaker
interesting take on it. Yeah, I mean, it did honestly just at this point, there there is enough of a lack of evidence to nail down which theory or which idea about consciousness is correct. So all of them are in the running. And depending on how the empirical evidence, you know, ah sort of falls, one of these theories hopefully will start making more and more sense. And then we we will be able to better say with more certainty as to whether machines will be conscious or not. Absolutely. Now, out of curiosity, when writing this book, what was one of your favorite things to learn or relearn?

Math's Foundational Role in AI

00:38:40
Speaker
For me personally, I think just the job so i'm like you, I'm an engineer. I trained as an electronics and computer engineer. and As an like and electronics engineer, you do a fair bit of math, but but in those days, you didn't quite pay attention to what the math was saying or doing, so just rediscovering how powerful some of these basic ideas are, like linear algebra, calculus, probability and statistics, that all of these things underlying machine learning. And we are building machines that are learning now. And turns out that all this math that you learned in the final year of high school or the first year of graduate school or undergraduate, that is enough to appreciate what is happening with AI today. And that to me was the most beautiful thing that all these basic ideas that I had learned a long, long time ago, and probably never paid enough attention to how powerful these techniques were, that these techniques are now front and center in the technology that is going to influence our lives going forward.
00:39:41
Speaker
I have to agree with you on that because it took most of graduate school and then afterwards to think about the processes in which you learned your first year of hi like last year of high school, first year of college to actually be able to understand what was going on. yeah And sometimes those gears in your head, it takes a while for them to just click and go, oh, yeah I see what's going on in that larger system. Yeah. Yeah. I mean, who would have thought, at least I didn't, that, you know, linear algebra would be so central to what's happening right now. This idea, you know, we all learned about vectors and matrices and they just seem like a whole bunch of manipulating of numbers.
00:40:28
Speaker
but that's what these machines are doing. And all these algorithms depend on having an understanding of at least the basics of linear algebra to figure out what what the hell is happening. Exactly. um Now, what is one big thing to kind of wrap this up for time wise? What is one big thing that you would like readers to take away from this or learn from the book? From the book. Or listening to the podcast, both. My you know desire for readers in terms of when they encountered this book or the podcast is that, like we said in the very beginning, that there needs to be a whole bunch of people who are going to start appreciating and understanding the basic mathematics that underlies the technology that we are unleashing on this world. And it's almost like a responsibility for those of us who want to get a deeper understanding that ah that we try.
00:41:26
Speaker
it's not easy and yet it's beautiful enough that if we persist that there's both joy and kind of appreciating what's happening but there's also the prospect of becoming more aware of you know what we're in for and this technology is not going away it's going to get it's going to get embedded deeper and deeper in our society and we can't leave lead this technology to just the people building it we We need others to be able to understand it enough to question them and, um you know, and hold people accountable for what they're building. And I personally don't see how we can get to that point without some appreciation of the math. I completely agree. Yeah. So I thank you for coming on Breaking Math and chatting with our listeners and I'm so happy that you could join us today. today
00:42:21
Speaker
ah Thank you very much, Artum. This has been my pleasure entirely.