Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Why Machines Learn: The Math Behind AI image

Why Machines Learn: The Math Behind AI

Breaking Math Podcast
Avatar
4.5k Plays5 months ago

In this episode Autumn and Anil Ananthaswamy discuss the inspiration behind his book “Why Machines Learn” and the importance of understanding the math behind machine learning. He explains that the book aims to convey the beauty and essential concepts of machine learning through storytelling, history, sociology, and mathematics. Anil emphasizes the need for society to become gatekeepers of AI by understanding the mathematical basis of machine learning. He also explores the history of machine learning, including the development of neural networks, support vector machines, and kernel methods. Anil highlights the significance of the backpropagation algorithm and the universal approximation theorem in the resurgence of neural networks.

Keywords: machine learning, math, inspiration, storytelling, history, sociology, gatekeepers, neural networks, support vector machines, kernel methods, backpropagation algorithm, universal approximation theorem, AI, ML, physics, mathematics, science

You can find Anil Ananthaswamy on Twitter @anilananth and his new book “Why Machines Learn

Subscribe to Breaking Math wherever you get your podcasts.

Become a patron of Breaking Math for as little as a buck a month

Follow Breaking Math on Twitter, Instagram, LinkedIn, Website, YouTube, TikTok

Follow Autumn on Twitter and Instagram

Follow Gabe on Twitter.

Become a guest here

email: [email protected]

Recommended
Transcript

Introduction to AI Mathematics

00:00:00
Speaker
Welcome to Breaking Math, the podcast where we can freely question why machines learn. And we get to learn more about the mathematics behind your favorite AI tools like chat, GPT, Sora, and Gemini.

Guest Introduction: Anil Ananthaswamy

00:00:11
Speaker
Today, we are joined by Anil Ananthaswamy.
00:00:14
Speaker
He's an award-winning science writer and former staff writer and deputy newsletter for the London-based New Scientist magazine. The 2019-2020 MIT Night Science Journalist Fellow. He writes for Quanta, Scientific American, Nature, New Scientist, and many other publications. His first book, The Edge of Physics, was voted book of the year in 2010 by UK's Physics World. And his second book, The Man Who Wasn't There, was long listed for the 2016 Penn E. O. Wilson Literary Science Writing Award. His third book, Through Two Doors at Ones, was named one of Forbes 2018 best books about astronomy, physics, and mathematics.

Journey into Machine Learning Writing

00:00:58
Speaker
His new book, Why Machines Learn the Elegant Math Behind Modern AI,
00:01:04
Speaker
was just published today. Anil trained as an electronics and computer engineer at IIT, Madras, for his bachelor's degree. And the University of Washington, Seattle for a master's in science and electrical engineering, and later worked as a distributed systems software engineer and architect before switching to writing. He studied science journalism at UC Santa Cruz.
00:01:31
Speaker
Now with Anil's impressive background, let's dive into this rich history of mathematics that drives machine learning.
00:01:50
Speaker
Hi Anil, how are you doing today? i ah um I'm doing well. Thank you. It's a pleasure to be on your show. Thank you for coming on. Now, out of curiosity, where did you start with the book originally? What was the inspiration for it? So I've been a journalist for about 20 years. I used to be a software engineer before I became a journalist. And for a long time, I was writing on topics like astrophysics, cosmology, neuroscience, particle physics. All the cool stuff. But what was interesting is that when I would write about those topics,
00:02:25
Speaker
I would try and understand the science as best as I could and then, you know, write about it. I never had any illusions about being able to actually do that work. But over the last sort of around 2016, 2017, I noticed that I was writing more and more stories about machine learning. Like it was just beginning to ramp up.
00:02:46
Speaker
I think every time I would interview the sources and start talking to them about what they were doing, I think the software engineer part of me woke up. It had been dormant for 15, 20 years. And ah so some part of me really wanted to learn how to sort of code and build these systems.
00:03:03
Speaker
And in 2019, I got a fellowship at MIT, the Knight Science Journalism Fellowship. And as part of that fellowship, I did a project learning how to code. and Going back to class, I we literally went back to CS101, sat with all these teenagers learning Python coding.
00:03:20
Speaker
um It was ah intense. You know, as I started coding, I realized that I really wanted to learn the basics of machine learning, not just code, you know, blindly. So I started listening to a lot of YouTube lectures by professors. One particular professor at Cornell who was very instrumental for, you know, my enjoying the subject. And it was when I was doing that, I realized that the mathematics underlying machine learning is quite beautiful.
00:03:48
Speaker
Um, and you know, the, now the writer in me woke up and said, Oh, I have to communicate this, you know, uh, to readers. Yeah. So that's how it, that's how it came about.

Purpose of the Book on Machine Learning

00:03:58
Speaker
I finally, you know, it was, it was a book that I would i have liked to read myself when I was trying to learn machine learning, some combination of storytelling, journalistic storytelling, history, sociology, and the math.
00:04:11
Speaker
um So, but it's really it's really the kind of, I found the math really lovely and I wanted to get that across. So hence the book full of equations. I think that that's crucial for anybody who either needs a refresher or you know you're switching fields, anything of that sorts. And sometimes even when you're really advanced in the field, you know we we need to take a couple steps back and think about the story behind it.
00:04:42
Speaker
and have that appreciation for it. I think that this book, is it's crucial for people to have, even if it's needing that refresher for stuff.
00:04:56
Speaker
And it really highlights everything that you need to know. Yeah, I mean of course, ah you know, this is my subjective take based on a lot of research and a lot of conversations with the experts about what I felt was the beautiful and essential part of machine learning I'm sure.
00:05:13
Speaker
someone else might have a slightly different take on what is crucial. You can only put so much into a book, it's already almost 450 pages long. And so I had to exclude, you know, crucial algorithms probably. But but I think my intent was to get at the conceptual heart of machine learning.
00:05:31
Speaker
Um, you know, not just the algorithms, but like get, get behind the ideas that underpin ML based on deep neural networks. Cause those are, you know, those are recent phenomena. There has been so much other work that has happened. So yeah, um, I agree with you that, uh, you know, the the the book kind of gets it.

Societal Understanding of Machine Learning's Math

00:05:50
Speaker
either as a refresher for someone who has a background in it, but also people who have maybe high school level or, you know, mathematics and really want to understand and ML at a depth that is more than what you might get from reading magazines. Now, besides appreciating the beauty and elegance of some of the basic math behind machine learnings, what are really the other reasons for writing the book?
00:06:18
Speaker
i am you know I really think that watching how AI is developing and watching the big companies that are kind of becoming the gatekeepers for this technology, I feel like we need a layer of society that, you know apart from the people building these systems, we need a layer of society that really understands what's happening at least a certain amount with with at least a certain amount of mathematical rigor because it's only when you look at the math behind machine learning that you start understanding both the power that is inherent in these systems and also its limitations.
00:06:57
Speaker
ah right So we need you know ah we need communicators, science communicators, journalists, policymakers, teachers, um just about anyone in society who is willing to be engaged with this technology at the level of at least the basic math. you know So we need society to become gatekeepers also, not just leave it to the people who are building these systems. And I think for me, that's really important know to convey conveyed them the the mathematical basis for this.
00:07:27
Speaker
I agree because so much decision making is made now through AI, whether it's social media, you can run a quick code on chat GPT or Gemini or whatever you have available to you, right? Or someone builds an in-house algorithm for a company.
00:07:48
Speaker
You need to know what decisions are being made and sometimes those algorithms may have implicit bias. well You don't realize that until you code it a few times.
00:08:01
Speaker
And I think when you you know when you understand kind of the mathematical basis for why these algorithms work, the reasons for the bias also become obvious. So you can understand where the bias comes from, whether it's from the data or whether it's from the choices you make of the algorithm, e etc.
00:08:18
Speaker
ah and And the math conveys that in ways that just language is unable

History and Evolution of Machine Learning

00:08:25
Speaker
to. i mean you know we can We can talk about it in you know plain language, but somehow the math just makes it so obvious.
00:08:32
Speaker
yes Now, tell us a little bit more about the history and evolution of this. The history of machine learning goes back a long way. I think ah for most people now, artificial intelligence of AI is literally just chat GPT being used as a kind of placeholder for all kinds of large language models. You know, GPT is not the only one out there, but the history goes back at least a certain kind of history. Like, for instance, the history of neural networks, they go back to the late 1950s.
00:09:02
Speaker
So Frank Rosenblatt was the first one to come up with an algorithm called the perceptron algorithm, which is essentially a single layer neural network that could be trained to, you know, do classification. For instance, if it had to distinguish between patterns of two kinds of images.
00:09:20
Speaker
And the idea was that these images somehow occupied some coordinate space where one type of images were in one type of one location in the coordinate space and the other images were in another portion of the ah coordinate space. And you could draw a straight line between the two kind of classes of images, then this algorithm was guaranteed to find that linear divide in finite time. This was a pretty big deal in the late 1950s, early 1960s.
00:09:51
Speaker
and And also there was ah Bernard Widrow, so Widrow with his student Ted Hoff came up with the least mean squares algorithm, also for training a single neuron, a single layer neural network. And at that time in the late 1950s, early 1960s, that's all they could do. They could only train a single layer neural network.
00:10:12
Speaker
and The neural network essentially was composed of artificial neurons. Artificial neurons are very, very loosely inspired by biological neurons. I think we should be very clear that the term artificial neural networks somehow seems to suggest that these things are as complex as biological networks. They're not.
00:10:32
Speaker
the exact so The artificial neuron is just a computational unit. It takes in some inputs, does some computation on them, and then decides when to produce an output or an not, or what kind of output to produce. And an artificial neural network is simply a network of these artificial neurons. And it's you know if you actually look at a biological neuron and what's happening in in a biological neural network,
00:10:54
Speaker
The complexity of that is so, so much greater, right? So nonetheless, nonetheless, these are artificial neural networks and the history of these, you know, it goes back to the late 1950s, early 1960s. But what happened at that time was that, you know, these were single layer neural networks. People didn't know how to train any more complicated neural networks. Like if it had more than, you know, a single layer, we didn't know how to train them.
00:11:19
Speaker
And around that time, two other AI researchers, Marvin Minsky and Seymour Pepper, came up with a, it's a very, very beautiful authoritative book called, you know, um and it was called The Perceptrons in honor of Frank Rosenblatt. But in that book, they kind of showed that the single layer of neural networks could not even solve um a simple problem. It was called the XOR problem.
00:11:42
Speaker
where if the you know if the data that you had was of a kind where you couldn't draw a straight line ah through the no coordinate space to separate two classes of data, then this neural network just single neural network just couldn't find a solution. They had mathematical proofs for that, but they also kind of suggested that even if you they They suggested it without proving it. They suggested that even if you had multiple layers in the neural network, it still probably wouldn't be able to solve that. And that kind of put a huge damper on neural network research. People just kind of stopped thinking about it and wondering because they thought, oh, this is not going anywhere. So pretty much you know towards the end of 1960s and early 1970s, the research in neural networks just kind of fell off a cliff.
00:12:30
Speaker
But meanwhile, other you know other algorithms were beginning to yeah be developed and a lot of machine learning history has to do with non-neural network based machine learning algorithms. so Now, tell us a little bit more while research on artificial neural networks did come to that big stand still.
00:12:53
Speaker
What were the other breakthroughs that were? I thought um the K nearest neighbor algorithms was a conceptually brilliant piece of work that also came out in the 1960s by Thomas Culver and Peter Hart of Stanford, right? The Culver Hart K nearest neighbor algorithm. And that was, you know, conceptually it's so, so simple. Like you basically, you know, think of your images as um you know, each image is converted into a vector.
00:13:21
Speaker
And the vectors are plotted in some high dimensional space. So let's say you have images of cats in one side of the coordinate space and images of dogs on the other side of coordinate space. Then you get a new image. You convert that into a vector and you kind of plot that in the same coordinate space and see what is it nearest to. Is it nearest to dogs or is it nearest to cats? And you know depending on that you classify the new image as a cat or a dog.
00:13:48
Speaker
And of course, it sounds very simple when you say it like that, but the underlying math that tries to make a formal case for it was extremely complicated. But they managed to show that it's an algorithm that does really, really well. I mean, yeah, I mean, that was a big breakthrough and it was very powerful and was used for a long time. It runs into this problem called curse of dimensionality. As you start getting into higher and higher dimensions, data starts behaving in a very strange way.
00:14:17
Speaker
um And so, as the dimensionality of the data got very big, the K-nearest near nearest-neighbor algorithm started you know not doing so well, so in terms of you know computational efficiency. i think and there were you know There was a lot of work to do with the you know the Naive-based classifiers, the optimal-based classifiers. so That forms a significant chapter in my in my book where I'm getting the reader to appreciate the probabilistic aspects of machine learning. you know We seem to think that when machines make a prediction that they're dead certain about what is you know what the prediction is, but actually under the hood, a lot of these things have uncertainties.
00:14:57
Speaker
ah you know and So there is a certain amount of uncertainty in a prediction that can be quantified, or the algorithms are explicitly probabilistic. And they're essentially choosing the most likely answer. It's is entirely possible that the least the slightly less likely answer is the correct one.
00:15:14
Speaker
But it chooses the most likely answer. so you know So understanding these kinds of uncertainties and probabilistic aspects of machine learning algorithms also, you know from the mathematical perspective, it makes a big difference in appreciating what's going on.
00:15:28
Speaker
probably, for me, the most sort of influential and important algorithm prior to neural networks were support vector machines. So I was telling you about Rosenblatt's Perceptron, which finds ah and know a linear divide between two classes of data. Now, if you imagine two clusters of data in coordinate space, one on the right side and one on the left side, and there's a little bit of gap between them,
00:15:56
Speaker
In principle, you can draw an infinity of straight lines through that you know through that gap. It can be a straight line, in you know or a plane, or a hyperplane, or whatever, depending on the dimensionality of our data. But in principle, you can have an infinity of them. And the perceptron does not find the most optimal one. It just finds any you know divide between the two classes of data.
00:16:19
Speaker
Support vector machines were the first ones to find what's called an optimal margin classifier. It it it it basically finds a linear divide between two pluses of data that is as good as possible, not any old separation between the two.
00:16:36
Speaker
um And then the other thing I was telling you about the dimensionality problem. One of the wonderful things that support vector machines also do is they use these things called kernel methods. And kernel methods are just this oh ah really, really amazing no a sort of mathematical trick where you take low dimensional data where you cannot find the linear divide.
00:16:59
Speaker
Because the data is so mixed up that you have to draw some really convoluted curve to separate two clusters of data or three clusters of data, whatever. And ah so the idea would be to project this data into really, really high dimensions, possibly even infinite dimensions. And that even, you know, it boggles your mind to think what infinite dimensions could be. But in order for support vector machines to work, they have to do dot products of vectors representing the data points and your new data that you're dealing with. Now if you're going into very very high dimensions or even infinite dimensions, you know, storing these vectors or doing the dot products is computationally really expensive. ah So what kernel methods do is they find the function that allows you to take two vectors
00:17:47
Speaker
And that function produces a dot product, ah not of the vectors in the lower dimensions, but the dot product of the vectors in the higher dimension. So you can get the dot, you you basically, let's say there's a function k of two vectors, x and y.
00:18:03
Speaker
then the function k gives you the dot product of x and y, not in the low dimensions, but as if these vectors were projected into the higher dimensions. So you basically are able to do computations in the higher dimensional space without actually ever converting your vectors and stepping into the higher dimensional space. So you computationally are doing your work in the low dimensional space, but the effective results you're getting represents what's happening in the higher dimensional space.
00:18:28
Speaker
So in the high dimensional space, you can find a linear divide. These linear classifiers are beautiful. So you you work in the high dimensional space, get the linear divide, and then you project back into the lower dimensions. And in the lower dimensions, that straight line looks like a very convoluted curve. So you've essentially ended up finding a nonlinear boundary between ah two classes of data. ah Because you projected your data up into high dimensional space, found this you know ah linear classifier and then projected it back.
00:18:57
Speaker
But computationally, you're always working in the low dimensional space. There are lots and lots of different kinds of kernels and they become more and more mathematically sophisticated. So that's a separate field in itself trying to you know design these kernels. What kinds of things do you project into the higher dimensional space? What are the features in higher dimensional space that you're working with? um and yeah But there are textbooks written on that entire thing by itself. so Exactly. what's What's really so special about support vector machines and kernel methods?
00:19:30
Speaker
I think the the special thing is really that support vector machines find what are called optimal

Neural Networks' Resurgence

00:19:38
Speaker
margins. like If you have two classes of data and you want to find a divide between those two, like you you want to find a straight line or a curve that separates those two classes of data so that when you get a new class of data and you want to figure out does it belong to you know class A or class B,
00:19:55
Speaker
ah you want to be able to do that as you know correctly as possible. you know You'll always make some mistakes, but if you if the if the dividing line that you found between the two clusters of data is optimal, then the mistakes you'll make when you're classifying unseen data is lowered.
00:20:11
Speaker
I mean, there always be mistakes, but you're basically lowering your probability of making mistakes. So that's support vector machines. But support vector machines find a linear you know divide between two classes of data and they you know they'll work in any dimensional space. But if you keep going into higher and higher dimensions,
00:20:29
Speaker
Then, ah you know, it gets computationally intractable. So that's where kernel methods come in. The combination of support vector machines and kernel method methods, what kernel methods do, like I just said, is they will project the data into high dimensions, find the linear divide in high dimensions, but the computation is all happening in the lower dimensional space.
00:20:48
Speaker
So it's literally you're finding some straight line in, I don't know, 10,000 dimensions, 100,000 dimensions. It doesn't matter what it is. It could be even infinite dimensions. And in infinite dimensions, you will always find a straight line to separate two classes of data. But when you project that back into you whatever dimensions you started with, 100 dimensions or 200 dimensions, will that straight line turns into a very nonlinear curve.
00:21:13
Speaker
And so the combination of support vector machines and this ability to find a straight line in higher dimensions without computing in higher dimensions, that's the power of support vector machines. And they have been used for all sorts of classification problems prior to neural networks becoming a big deal. OK, awesome.
00:21:33
Speaker
Now, as we see machine learning progressing on multiple fronts, where do artificial neural networks start coming back into the fray? When when did this happen and why?
00:21:48
Speaker
I mentioned earlier earlier that, you know, when Minsky and Pepper kind of made this case against single layer neural networks, saying that they couldn't even solve the XOR problem, a lot of people dropped research on that, you know, in that field. But there were a few, um you know, there were a few people who persisted. And again, i in my book, um you know, I take a few examples of the kinds of work that kept going on.
00:22:17
Speaker
So one big development happened in nineteen early 1980s. John Hopfield developed something called Hopfield Networks. Hopfield Networks are essentially, again, a certain kind of architecture of these artificial neurons. They are connected bidirectionally. So, you know, normally in a neural network, you have the input coming in from one side.
00:22:42
Speaker
goes into one layer of neurons that layer of neurons produce an output and that goes into a subsequent layer and and it keeps going in one direction these are called feed forward networks so so the output of a neuron never feeds back to itself or to neurons in the layer before it's you know So those are the standard things that we see today, these feed forward networks. John Hopfield developed a slightly different architecture where you could have recurrent connections where a neuron's output could feed back into other neurons in its own layer. So you get this you know bidirectional a neural network and he showed how
00:23:21
Speaker
you could store memories in in these networks. So what you do is you essentially um figure out how to store an image in such a network. And the basic idea there is that if you can adjust the weights of the network such that When you store your image, the network has what is called a minimal energy and there's a way of defining what the energy of a neural network means in this context. But the math shows you that if the image, if the outputs of the neuron represent pixels of an image, let's say there are a hundred neurons, right? And each one of those neurons is outputting something and you take that as the value of a pixel. So you've got a hundred pixels and that's a 10 by 10 image, right?
00:24:08
Speaker
Yes. So if you store the image yeah in this particular Hopfield network, then the math shows that that network is basically at its lowest energy state. And now what you do is you perturb the network, right? You change some of the pixel value. So suddenly the image is corrupted and you know you've got all sorts of noise in it. But the dynamics of the network is such that it wants to go back to its lowest energy state.
00:24:34
Speaker
and And so if you perturb the network, which means you have perturbed the memory of the image, then the dynamics of the networks are are such that it will just basically all the neurons will start firing in in response to the nearby neurons. And eventually they will all settle back to this state that represents the minimal energy state. And now when you read off the output of the neurons, so you're going to get back the original image because that image represents the low energy state or the lowest and energy state. So that was a very big deal in 1982.
00:25:05
Speaker
John Hopfield came up with that. It's not the kind of networks that we use today, but it was like the you know beginning salvo of the return of neural networks. Probably, yeah again, and you know there are other things that happened, but I think the real key development was 1986 when Rummel Hart, Hinton and Williams came up with the the back propagation algorithm. This was finally, um I mean, I have to make sure that we get the history right in the sense that this algorithm is often attributed to Rummel Hart and Hinton and Williams, but
00:25:41
Speaker
ah the backrop The idea of backpropagation goes all the way back to Frank Rosenblatt in the nineteen fifty s you know late 1950s, early 1960s. He knew that in order to train multilayer neural networks, which have more than one layer, you need some kind of algorithm. He actually called it backpropagation in his book, ah but did not know how to do it. And from the 1960s all the way to the you know mid-1980s, there were various people who attempted to solve this problem in different ways, not necessarily for neural networks. There was stuff that was going on in control systems, in rocketry, in economics. you know So these ideas were all there in different forms, but it was the Rumelhart, Hinton, Williams paper that clearly showed how to do this for deep neural networks.
00:26:29
Speaker
deep And deep here simply means that the neural network has one layer that is sandwiched between the input and output layer. So it's hidden. It's hidden from the input side and it's hidden from the output side. So any neural network which has at least one hidden layer is technically a deep neural network and that requires the back propagation algorithm to train it. And so that was a big, big deal in 1986.
00:26:54
Speaker
And maybe, again, somewhat subjective, but 1989 was the so-called universal approximation theorem. It's one thing to say that you can train a deep neural network to do something, but it's completely another thing to mathematically prove that you know there is no limit to what a neural network can learn. That was what the universal approximation theorem did. It basically said, you know so okay so there's one way to think about ah what neural networks are doing. They're essentially given some input x.
00:27:27
Speaker
and ah ah desired you know desired output Y, the neural network has to find the function F, like and Y is equal to F of X, right? And that function can be anything. It depends on what kind of correlation exists between your input and outputs. And that that correlation, is exist it exists in the data, right? So the training data is basically some form of pairs of input and output that you're providing the neural network.
00:27:54
Speaker
you don't know how the input and output are connected what kind of function maps the input to the output it's like a black box right so that's yeah kind of what the neural network has the neural network algorithm kind of figures out what that function f is that maps you know y is equal to f of x right now what the universal approximation theorem shows is that if you have a single hidden layer network at least a single hidden layer network then given enough artificial neurons which means given enough computation A neural network who will find any it will approximate almost any function that is necessary to correctly map the input to the output. So basically, it's ah it's a network, again, given enough artificial neurons. so Of course, in principle in practice in in principle, that's possible, but in practice, you never have a big enough neural network. So it will always be an approximation of the real function.
00:28:49
Speaker
But that's what the theorem shows very clearly that it is a neural network is what's called the universal function approximated. it will It'll approximate any function no matter how complex that function. And those these things were really big developments through the 1980s. And I think that's set the stage for you know further research. Because finally, people said, oh, okay these things actually can do amazing things. They still didn't have the computation power at this stage to do much with it. But the principles were falling into place.

Modern Neural Network Advancements

00:29:20
Speaker
Sure. Now we knew that those advances weren't enough. Support factor machines dominated through the 1990s and early 2000s, but what changed?
00:29:33
Speaker
I think in order for neural networks to become what they've become today, a few things had to change. One was the necessary for really large data sets. So neural networks are extremely data hungry um and they need a lot of either human annotated data or you know just basically a lot of data. and that didn't happen until ah you know the the end of the first decade decade of the 2000s. We had the ImageNet dataset. That was one of the first real large-scale datasets for image recognition.
00:30:13
Speaker
and then And then the other thing that had to change was these are all neural networks are very, you know, they do a lot of matrix manipulations and at some point people figured out that the GPUs that were really being used for, you know, video games were ideal sort of, you know, processors for doing a lot of these matrix manipulations.
00:30:37
Speaker
So it was a combination of the availability of large data sets, the back propagation algorithm to train these networks and the use of GPUs that finally sort of brought neural networks ah to the fore. I think they they became really powerful and that's when the first neural network based image recognition sort of systems, which was AlexNet in 2011 that broke through and made such a big splash. ah So after that kind of neural networks have pushed aside support vector machines. um And yeah, it it really took all these elements like large availability, you know, availability of large data sets,
00:31:17
Speaker
ah because of the internet, um the use of GPUs and backpropagation, all of that coming together that made the difference. Absolutely. Now your book ends with a look at how deep neural networks and challenging standard ideas of how machine learning works. Now tell us a little bit about that.
00:31:39
Speaker
Yeah, so standard machine learning theory, ah you know one of the things it says is that if your model is over parameterized, that there are a lot of different knobs that you can tune in order to fit your data. And if if there are way too many parameters, then you'll essentially end up overfitting the data. You will start kind of, you you you don't want to overfit your data that well because there might be noise in the data.
00:32:05
Speaker
And when you overfit, you end up fitting the noise also. And so when you start then testing your model on new data, which might have different kinds of noise, you're you're not going to do well on the new data that you haven't seen. So if you're trying to make predictions about ah you know unseen data, and but you have overfit your model to the training data in such a way that you're picking up all the you know nuances ah that exist in the training data, then when you see new data, you're going to make lots of mistakes.
00:32:34
Speaker
um and that's So basically, traditional machine learning theory says that you have to find the right balance between underfitting and overfitting. You have to be somewhere in the middle, the Goldilocks zone. and and But the very intriguing part about neural networks is that they are actually extremely over-parametrized. They have way too many parameters, and they yeah and yet they don't seem to overfit the data. They seem to generalize well to unseen data as long as the data is drawn from the same underlying distribution. So there's something going on that we don't fully understand. um And standard machine learning theory is unable to explain why um neural networks ah don't overfit and they generalize despite having enormous numbers of parameters. I mean, when you think of modern neural networks,
00:33:26
Speaker
you know We're talking of you know half a trillion parameters, even a trillion parameters. that's Those are massive. And of course, the data is also very massive. But nonetheless, there's something about the way these networks are functioning um that still hasn't been theoretically well explained.
00:33:45
Speaker
And so so the kind of book ends at that point where we point out that we have entered a new, what one of the researchers I spoke to called Terra incognita. ah you know Basically, this is unknown terrain as far as machine learning is concerned. We have a lot of empirical evidence about how these networks behave, but we don't necessarily have a good theoretical understanding

Theories of Machine Consciousness

00:34:08
Speaker
of what's happening. I know that there's always been this larger question of, are machines conscious? Is it computational? I know that you've given talks on that sort of stuff before, and maybe I'll leave this also as a teaser for future work from you, right? You talked a little bit in the book about an example for folks when you gave a talk on like, are our machines really conscious?
00:34:34
Speaker
Will they ever get to that level? um my This is a really tough and hard question, not because I don't want to answer it, but I think it really depends on what one means by consciousness and what it might be. So there are certain so yeah so there are certain ways of thinking about what consciousness is. ah If you take those definitions, then machines are not going to be conscious because um there are people who will argue that consciousness is really a property of biological systems in the way biological systems are organized and that consciousness has properties that cannot be explained by the physics of the material world and so on. So that's a particular way of thinking about consciousness. If that's the case, it's unlikely that machines are going to be conscious anytime in the near future.
00:35:26
Speaker
um But there are other ways of thinking about consciousness where where where um it's very likely that ah you know we'll end up with machines that are conscious in the way that ah biological systems are conscious. you know They may not be conscious of all of the stuff that we are conscious of, but But machines will be conscious of what they are conscious of. So that it really hinges on definitions of consciousness and what we mean by it. um So if you take a purely materialistic perspective on what consciousness might be, that is just an emergent property of the way matter is organized. you know And in this case, the matter that comprises us, the biological system, and we are biological machines,
00:36:11
Speaker
Then in principle, I mean, I'm not saying this is practical or that it will happen anytime soon, but in principle, ah yeah, it's entirely possible that machines will be conscious.

Foundational Mathematics in AI

00:36:25
Speaker
Interesting take on it. Yeah, I mean, it did honestly, just at this point, there there is enough of a lack of evidence to nail down which theory or which idea about consciousness is correct. So all of them are in the running.
00:36:40
Speaker
And depending on how the empirical evidence you know sort of falls, one of these theories hope hopefully will start making more and more sense. And then we we will be able to better say with more certainty as to whether machines will be conscious or not.
00:36:57
Speaker
Absolutely. Now, out of curiosity, when writing this book, what was one of your favorite things to learn or relearn? For me, personally, I think just the job, you know, so I'm like you, I'm an engineer, I, you know, I trained as an electronics and computer engineer. And and as an like and electronics engineer, you do a fair bit of math.
00:37:21
Speaker
but But in those days, you didn't quite pay attention to you know what the math was saying or doing. So just rediscovering how powerful some of these basic ideas are, like linear algebra, calculus, you know probability and statistics, that all of these things underline machine learning. and And we are building machines that are learning now.
00:37:43
Speaker
And turns out that all this math that you learned in the final year of high school or the first year of undergraduate, that is enough to appreciate what is happening with AI today. And that, to me, was the most beautiful thing, that all these basic ideas that I had learned a long, long time ago and probably never paid enough attention to how powerful these techniques were, that these techniques are now ah front and center in the technology that is going to influence our lives going forward. Who would have thought, at least I didn't, that you know linear algebra would be so central to what's happening right now? This idea, you know we all learned about vectors and matrices and they just seem like a whole bunch of manipulating of numbers.
00:38:30
Speaker
But that's what these machines are doing. And all these algorithms depend on you know having an understanding of at least the basics of linear algebra to to figure out what what the hell is happening.

Societal Engagement with AI Mathematics

00:38:42
Speaker
What is one big thing that you would like readers to take away from this or learn from the book? From the book. um Or listening to the podcast. My you know desire for readers in terms of when they encounter this book or the podcast is that Like we said in the very beginning that there needs to be a whole bunch of people who are going to start appreciating and understanding the basic mathematics that underlies the technology that we are unleashing on this world.
00:39:13
Speaker
and And it's almost like a responsibility for those of us who want to get a deeper understanding that the that we try. It's not easy and yet it's beautiful enough that if we persist that there's both joy and kind of appreciating what's happening but there's also the prospect of becoming more aware of you know what we're in for. I mean, this technology is not going away. It's going to get deep. you know um It's going to get embedded deeper and deeper in our society. And we can't leave leave this technology to just the people building it. we We need others to be able to understand it enough to question them and you know and hold people accountable for what they're building.
00:39:59
Speaker
And I personally don't see how we can get to that point without some appreciation of the math. I completely agree.

Conclusion and Farewells

00:40:09
Speaker
Yeah. So I thank you for coming on Breaking Math, and I'm so happy that you could join us today. Thank you very much, Autumn. This has been my pleasure entirely.