Introduction to Mathematical Concepts
00:00:00
Speaker
There are times in mathematics when we are generalizing the behavior of many different but similar entities.
Understanding Big O Notation
00:00:05
Speaker
One such time that this happens is in the use cases of Big O notation, which include describing the long-term behavior of functions and talking about how accurate numerical calculations are. On this problem episode, we're going to discuss Big O notation and how to use it. Episode P12. Oh my god.
00:00:30
Speaker
I'm Sofia, and you're listening to a Breaking Math Problem episode.
Algorithm Runtime Explained
00:00:33
Speaker
Today I'm going to be alone again, but we're going to talk about big O notation. So you probably have run into this if you've ever studied computer science for any length of time. You may have even run into this referred to indirectly if you've heard something taking, for example, exponential time. And we're going to talk about what this means and how to use it. And we're going to start by talking about algorithm runtime.
00:01:00
Speaker
There's an old joke about a painter painting lines on the road. There is someone employed to paint lines on the road. They come into work on Monday, fresh and happy. The foreman places a bucket of paint on the ground and says to the painter, I need you to paint lines every day. On the first day, the painter paints 100 lines. The foreman comes at the end of the day and is impressed and starts telling everyone about this painter's efficiency. On the second day, however, the painter only paints about 40 lines. The foreman comes at the end of the day and is less impressed since this is about average.
00:01:29
Speaker
The day after, they only paint around 30 lines, and the next day 25. And this continues until, by the second Friday, he's down to 16 lines per day. The foreman calls the worker for a private meeting and tells them, I was really impressed with your work the first day. You were well above average. The second day, you did about average work, but your performance has been dropping, and now you're down to less than the sixth of what your performance was on the first day. Are you having trouble with something? And the painter replies, it's not my fault. The new lines are getting further and further from the paint bucket.
00:01:58
Speaker
So this is the difference. So we're going to use this to explain the difference between linear and quadratic runtime.
Mathematical Foundations of Big O
00:02:05
Speaker
So, you know, if he would have just carried the paint buggy, he probably would have painted a lot of lines, right? Or at least moved it every few lines. That would be what's called linear runtime, where the number of things you have to do is in proportion with how long it takes.
00:02:19
Speaker
So let's say you have 100 lines, it takes like, you know, 100 minutes, 120 lines, 120 minutes, 15 lines, 15 minutes, and so on. Versus quadratic runtime, which is the total time that it would take this person who leaves the paint bucket at the starting point to do a certain number of lines.
00:02:38
Speaker
If it took the painter 1 day to draw 100 lines, it would take him 4 days to draw 200 lines, 9 days to draw 300 lines, 16 days to draw 400 lines, and so on. So using this system, 100 lines takes 1 day.
00:02:55
Speaker
But this person finishing a thousand lines will take 100 days, where if it were a linear algorithm, this would only take 10 days. And as you can see, there's a big difference between these two. So what does this mean? And what does this mean about big O notation? How does this relate?
00:03:15
Speaker
So this runtime can actually be described using big O notation. A linear runtime, which is carrying the paint bucket, can be described as big O of n. You just write a capital O, open parentheses, lowercase n, or whatever case n you want, but it's usually lowercase n parentheses.
00:03:35
Speaker
versus quadratic runtime, which is O of n squared. So what O is, is that as n gets bigger and bigger and bigger, it describes the behavior of the function. So for example, O of 4n is actually the same thing as O of n, and we'll explain why. So basically, if you take 4n divided by n as you go to infinity, as n goes to infinity,
00:04:00
Speaker
It's just the value of this ratio stays 4. And since it's a constant, in Big O notation, they're equivalent. That is the heart of the Big O notation. If you go to infinity and the ratio of two Big O functions is constant,
00:04:15
Speaker
then they are equivalent. That is why big O notations are written in the simplest form. For example, big O of n squared plus n would be just usually written big O of n squared. Because if you think about it, n squared plus n divided by n squared as you go to infinity just keeps getting closer and closer to 1. To plug in exact values,
00:04:38
Speaker
So the formula for this n squared plus 1 over n squared, when n is 1, it's 2. When n is 2, it's 1.25. Then it's 1.11, 1.06. And by the time we're at the 19th value, it's 1.002. And if you just keep going, like let's say we did, we're at 1.000000000001.
00:05:05
Speaker
And they just keep getting closer and closer to one and since one is a constant, you know value that doesn't change The values are equivalent so big O of n squared is what we usually use Because big O of like n squared plus a hundred n will be the same thing because n gets very large It's still n squared just goes way past it you can think about this geometrically as the difference between the area of a square with a certain area on one length and
00:05:32
Speaker
and the amount of square footage that the measuring tape would use. If the yard is like one inch by one inch, the measuring tape used to measure this little yard, if it's like a half inch wide, would be about half the area of the yard itself. But if the yard is like, you know, 100 miles on each side, then the area will be almost negligible.
00:05:54
Speaker
So that's why x over x squared tends to zero as x goes to infinity. This mathematical concept is known as limits, and we've talked about it before.
Dynamic Arrays and Amortized Time
00:06:05
Speaker
But it's invited to say it's one of the foundational concepts in formalizing calculus. And not just formalizing calculus, but it's used actively within calculus itself. And various fields that use calculus. There's usually not a big field called calculus.
00:06:19
Speaker
So here's some examples of O for various algorithms. There's O of 1. So O of 1 just means in constant time. One example is that in the analysis of algorithms, memory access is considered to be O of 1, and so is addition.
00:06:35
Speaker
And of course, there's some nuance within this. For example, a machine that has to index 100 petabytes of RAM, probably using modern technology, could not be built as quickly as a very tiny machine that has to access 16 bits of memory. So the size of what you're talking about can actually
00:06:55
Speaker
affect the big O values of what you're talking about. And also with addition, if you're dealing with variable length addition, you would actually have to pop that up to big O of n. Within big O of 1, there's amortized big O of 1. And that is still just big O of 1, but what it is is
00:07:12
Speaker
It's when sometimes operation takes longer than one unit of time, but overall it takes one unit of time. So for example, what does this actually mean? So let's talk about dynamic array insertion. An array is just a list of items that are in a row. So you can index like the third or fifth item. In a lot of programming languages like C and assembly, a lot of more bare bones programming languages, arrays are of a constant length by definition.
00:07:38
Speaker
But in other programming languages, arrays can be variable length. So a lot of times the way this is
Logarithmic Time Complexity
00:07:45
Speaker
achieved is that when the capacity of the dynamic array has been reached, the array length is doubled. Allocating this memory just takes a big O of n time because
00:07:57
Speaker
because you just allocate the memory which actually takes less than big O of n time using most algorithms. I've seen some that do it in a big O of log n of time, but just copying over the array to this new memory will take big O of n time. However, most of the time it takes big O of 1 time, and if you take big O of n plus n times big O of 1 divided by n plus 1,
00:08:18
Speaker
you basically get a line over a line and which equals big O of 1. Next up in the common formulas is this big O of log n of time, which is finding a value in a sorted array. So if we have about a bunch of values in an array and the array is sorted. So one example of this is like a dictionary. All the words are sorted by lexical order.
00:08:44
Speaker
So finding a value in a dictionary, one approach you could do is you could open to the middle, then if you're ahead of the word, then you divide the first half in middle, but if you're behind the word, then you divide the second half in middle, and you just keep going left and right, dividing it in halves and halves until you're on one page. And that's an algorithm known as binary search, and it's basically the fastest sort of algorithm of sorted array.
00:09:13
Speaker
There are some algorithms actually that are big O of log log N.
00:09:18
Speaker
like it's called actually a Y fast tree, T-R-I-E. I recommend searching that up on your own time because it's a kind of a big thing to understand. But yeah, moving on, we have big O of n, which is adding a list of numbers together because you have to read each number and add it to the previous one. And each one of those operations takes constant time. So constant plus constant plus constant n times is big O of n. This is also finding the minimum of a list of numbers.
00:09:47
Speaker
if the numbers are not sorted because it's going to take on average going through half the list and big O of n divided by 2 is just big O of n. Remember from before because they have the same asymptotic runtime so it helps us classify this algorithm with others if we just write a big O of
Sorting Algorithms and Time Complexities
00:10:05
Speaker
n. Then we have big O of n squared which is sorting lists of numbers using bubble sort. So bubble sort is basically just you let's say you wanted to sort a pack of cards so you put them all under the table
00:10:17
Speaker
He started at the first pair of cards and he'd say, okay, if the one on the left has a higher value than the one on the right, so let's say that spades had a higher value than clubs and clubs had a higher value than diamonds and diamonds, then...
00:10:33
Speaker
hearts. You would just be like okay does it go to the left or the right of this and if it goes to the right then you'd swap but if it goes to the left then you don't swap and then you look at the second and third card and then you go all the way to the end and then you go right back to the beginning but then you move one card over and then you start there and then you go to the end again.
00:10:51
Speaker
If you do that over and over again until you're just basically going all the way down but moving one card down each time and each time you move down you compare pairs of cards, you get a sorted array using bubble sort. And this was one of the first sorting algorithms discovered.
Complex Algorithms and Challenges
00:11:09
Speaker
But it was beaten by some that are O of n log n, n times log n, which sort things in a quicker time. Usually these are divide and conquer algorithms, like Quicksort, which you pick a random number within a list, you separate the array into the numbers that are less than and higher than this value, then you divide each of those lists, the ones that are less than and the ones that are more than into their own lists, and then you sort them using the same thing until you're down to just pairs of numbers.
00:11:38
Speaker
which you sort directly and then you just go all the way back up and you have sorted things. And then there's heap sort, which would take a long time to describe, but basically you keep arranging different subsets of the array into what is called heaps, which maybe we'll do an episode on data structures at some point.
00:12:00
Speaker
And then if you want to go to algorithms that have really bad time complexity, you have things like big O of 2 to the n, which means that every additional piece of data or input that you have to process doubles the runtime of your, approximately doubles the runtime of your algorithm.
00:12:22
Speaker
So there's a thing called the traveling salesman problem and what this is, is let's say there's a salesman who wants to go from city to city and they know how long it takes to get from one city to another. What is the most efficient route? This actually turns out for a large number of cities to be a very difficult problem. So if you think about two to the one hundredth is already at one novillion.
00:12:48
Speaker
which means that if you do a billion calculations every second for this operation, which is actually probably less than it would be because each operation in a modern computer is done at a third of a nanosecond per clock cycle. So usually these take hundreds of nanoseconds, if not microseconds. So for this to be one nanosecond, it takes basically 10 to the 21st seconds. And how many years would that be?
00:13:17
Speaker
which would take 10 trillion years for 100 cities to calculate using dynamic programming.
Logarithmic Bases in Big O
00:13:24
Speaker
And let me just stop right now to say the reason why we don't care what, why I haven't said what base of logarithm and have just been saying log is because logarithm, if you take base 3 log, if you take the base 3 log of a number divided by the base 10 log of that number, you're always just going to get log base 3 of 10.
00:13:44
Speaker
no matter what. So basically for any two logarithmic bases, you're going to get a constant ratio. If you think about it, this makes sense. Remember with logarithms, it's the opposite of exponentiation. So since 2 times 2 times 2 times 2 times 2, that is to say 2 to the fifth is 32, log base 2 of 32 is 5.
00:14:08
Speaker
Because log base 2 of 32 is asking, is saying this is how many times 2 would have to be raised to the power of to equal where we're taking the log of which is 32. So then we have the change of log base formula, meaning that if we want to go from one log base to another,
00:14:26
Speaker
So let's say we want to take the log of something in an arbitrary base, like let's say you do it on a calculator, you know, like just like a garden variety scientific calculator, you take the log of the number divided by the log of the base. And it turns out that if you take two numbers written this way, and you divide them by one another, you end up with a with one log over the other.
00:14:48
Speaker
So, and remember, a constant times a big O is just that same big O. We'll go over later on how to do arithmetic with big O notation here in just a second, actually. But yeah, suffice it to say, this would take a very long time, 10 trillion years.
00:15:06
Speaker
And so what do people do when they want to do this? Because if you think about it, there's a lot of real life applications that this needs to be done for, for example, planning routes for delivery drivers. And in these cases, approximations are used, which is usually done using
00:15:22
Speaker
randomized algorithms. Stochastic algorithms, as they're known. These basically just randomized information to kind of try random solutions within various confines and refine the solutions according to a certain measure of how close that function is to the desired function. This is using everything from evolutionary programming
00:15:46
Speaker
to at its heart machine learning and if we're doing the traveling salesman problem by brute force we take big O of n factorial that's to say one times two times three all the way to n and this just explodes i would play with this on paper if you have time
00:16:05
Speaker
So how do we do arithmetic on big O values? Well, we discussed before how multiplying by a constant maintained the same big O value. So if f is big O of n squared, that means that f of n divided by n squared approaches a constant over time. And you can write is with a little c with a line
00:16:25
Speaker
going from the left to the right. It's like a little e as well. It kind of looks like a fastly drawn epsilon,
Arithmetic Operations with Big O
00:16:32
Speaker
but it's written in the same space that the equals operator is written in. So you do like f, squished e, big O of n squared.
00:16:44
Speaker
So multiplying f by a constant is not going to change the big O notation. So k times big O of g, you know, g being the function in our big O, would just remain big O of g. Now let's say we have some f that's big O of g and f prime that's big O of g prime.
00:17:05
Speaker
we would know then that F times F prime is O of G times G prime. Basically, what that's saying is that O of N times O of N is O of N squared. And sometimes you run into this kind of stuff. So if you're doing loops within loops, for example, like if you do like... So in programming, there's a thing called a for loop.
00:17:28
Speaker
So you do like for 1 to 100, so I mean, so you do, so like for i is from 1 to 100. So you set i as 1 and then you do the stuff within the loop. So you could do like print i and then just write i to the screen, which the current value of i, which would be 1 to the screen. And then it sets i to 2 and does the same stuff within the loop. So if you do nested loops, which is loops within loops,
00:17:54
Speaker
You get to like a big O of N cubed and to the fourth for four loops. And this is why a lot of rookie programmers have code that runs for a very long time because they run so many loops within themselves.
00:18:08
Speaker
All right, and if you have a function times a big O function, then you could just pop the function within the big O function. So like 4n squared times big O of n would be big O of 4n cubed, which is itself big O of n cubed.
00:18:26
Speaker
And when you add two big O functions, you just take the one that grows quicker. So big O of n plus big O of n squared is big O of n squared. One other use case for big O notation is in numerical analysis. So let's say we want to calculate a value and we want to know how accurate we are.
00:18:43
Speaker
So for example, if we want to take e to the x, which is the basis for a lot of numerical computations at their heart, including a lot of times under the calculator when you do like 3.4 to the x, is actually doing e to the log 3.4 times x, which ends up being the same thing.
00:19:01
Speaker
So e to the x is equal to 1 over 0 factorial, which is 1 over 1, plus x over 1 factorial, plus x squared over 2 factorial, plus x cubed over 3 factorial, and so on. So as the factorial number increases, x doesn't increase as quickly, so you can see that as a big o problem within itself. But there's another big o problem here too.
00:19:27
Speaker
If we're dealing with very small numbers, which sometimes happens in specific numerical applications, so much less than one in this context, sometimes we don't have to write even that many terms. We could stop at three terms. So that would be e to the x is one plus x plus x squared would be over two would be our approximation. And we would need this when and we would only
00:19:50
Speaker
need this rough of an approximation when we're dealing with like numbers that are extremely small near the error term already. So how do we write the amount of error? What we can actually do is we can write error you sometimes write in this epsilon as O of x cubed. So here big O notation is kind of used in the reverse way where big x cubed is supposed to be very tiny.
00:20:16
Speaker
So for example, if X is 10 to the negative fifth, X cubed would be 10 to the negative 15th. So an error of big O of X cubed would mean that it's some constant times 10 to the negative 15th is in the ballpark of the error term. This is used constantly in numerical analysis. So if you go into scientific computing or any field like that, you're going to have to learn that.
Conclusion and Merchandise
00:20:45
Speaker
And this notation is pretty extensible. So for example, there's a sub-exponential notation, which we won't go into right now, but that's 2 to the little o of n, which can be analyzed using regular big O notation. I'll leave that as an exercise to the listener, like a lazy mathematician who totally isn't lazy and just decided to do a podcast.
00:21:13
Speaker
But I also recommend to kind of learn more about the big O notation, kind of get a feel for it. On Wikipedia, check out the articles on big O notation, time complexity, and amortized analysis.
00:21:29
Speaker
All right, everyone. So thank you for listening to this problem episode about Big O notation. It's one of my favorite topics and I'm glad I got to do a full thing kind of dedicated to it. I mean, we've talked about it before, but yeah, I'm glad that I got to talk about it. So as always, keep on breaking math. Oh, and if you want to buy my poster,
00:21:53
Speaker
which is a summary, a pretty big overview, but it gets into some core linear algebra concepts and stuff like that. It's talking about tensors and how to apply tensors to geometry intuitively and stuff like that. It's $15.15 plus $4.50 shipping handling if you're in the United States.
00:22:16
Speaker
If you're outside of the United States, just message me. But yeah, I've seen people put it up in their offices, and I'm thrilled with all the good feedback we've gotten on our website. Oh, to get those, go to just facebook.com slash breakingmouthpodcast. Just click on shop. We're gonna have our own dedicated shop pretty soon, but until then, just keep going there for our merchandise. All right, I'm babbling, so see you, everybody.