Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
42: Maybe? (Probability and Statistics) image

42: Maybe? (Probability and Statistics)

Breaking Math Podcast
Avatar
788 Plays5 years ago

Statistics is a field that is considered boring by a lot of people, including a huge amount of mathematicians. This may be because the history of statistics starts in a sort of humdrum way: collecting information on the population for use by the state. However, it has blossomed into a beautiful field with its fundamental roots in measure theory, and with some very interesting properties. So what is statistics? What is Bayes' theorem? And what are the differences between the frequentist and Bayesian approaches to a problem?


Distributed under a Creative Commons Attribution-ShareAlike 4.0 International License (creativecommons.org)


Ways to support the show:

Patreon Become a monthly supporter at patreon.com/breakingmath

Recommended
Transcript

The Evolution of Statistics

00:00:00
Speaker
Statistics is a field that is considered boring by a lot of people, including a huge number of mathematicians. This may be because the history of statistics and often the practice of statistics starts in a sort of humdome way, collecting information on the population produced by the state or by business.
00:00:17
Speaker
However, it has blossomed into a beautiful field with its fundamental roots in measure theory and with some very interesting properties. So what is statistics? What is Bayes' theorem? And what are the differences between the frequentist and Bayesian approaches to a problem? All this and more on this episode of Breaking Math. Episode 42, maybe?

Meet the Host and Support the Show

00:00:54
Speaker
I'm Sophia and this is Breaking Math. With us we have on again Matt Barbato. Matt, thanks for being on the podcast. Thank you for having me again. So what's your impression of statistics from your life, I guess?
00:01:06
Speaker
Well, other than the business related stuff that I have to do for not fun reasons, I am a sports person. So I love statistics. Well, that's cool. Yeah. So like in fantasy sports, I think a lot of people who aren't necessarily considering themselves into math might get a little headway into math due to things like fantasy sports and analyzing how a player is going to do or how they did last season, what they might do next season.
00:01:35
Speaker
Yeah, it's cool. It's all about prediction and overcoming the human inability to really deal with probability-related data. But before we continue with that, you can support us on patreon.com slash breakingmath. You can follow us on Twitter at breakingmathpod. We have a website breakingmathpodcast.com and our Facebook page is facebook.com slash breakingmathpodcast.
00:01:58
Speaker
You could buy posters there for $22.46, which is pie to the E dollars. Thank you for laughing at our price. It is meant to be laughed at and 12.50 shipping and handling.
00:02:15
Speaker
It explains the math behind the mathematical objects in Einstein's general theory of relativity, the Einstein field equations, which are r mu nu minus 1 half r g mu nu equals 8 pi g times t mu nu. And that describes how space curves when there's mass and how mass follows the curvature of space. So without further ado, let's go into statistics.

Origins and Definitions of Statistics

00:02:47
Speaker
So first we're going to talk about the history of the term statistics. And statistics comes from a statisticum collegium, which means council of state. So it was literally just they collected information on the population. I mean, so you could think of like, like tons of stuff that you'd need for to run a big empire, for example, right?
00:03:07
Speaker
Yeah, and I'm assuming the word state has the same root as something like status. So statistics just shows you the status of a certain situation that you're measuring, I'm assuming, coming from that etymology. Yeah, I think you could decompose it like that. Yeah, I could justify anything in my mind, at least. Very postmodern. Yes. But probability is different than statistics, obviously. Statistics is just the pieces of data.
00:03:36
Speaker
while probability is analyzing the data. Well, I don't know. It's like really statistics is like the study of analyzing the data, but probability is a property of the data. And that's from probable, from probabilis. Well, first from old French probable, then Latin probabilis, which means credible, testable, or agreeable, which is from probade, which means to try or to test.
00:04:02
Speaker
So study of statistics is the study of where we've been, and probability is the test of what might happen next. Yeah, and it's actually really closely related to the difference between Bayesian and frequentist statistics. Because a frequentist statistics says, what can we say about this thing that keeps happening? Well, the Bayesian approach is saying, what do we know about the future given our current worldview?

Frequentist vs. Bayesian Statistics

00:04:28
Speaker
You might think that the difference is just philosophical, but there's real mathematical consequences. I think I might have an example of that. Once I said I was a sports guy, and one of the things that people do with all these statistics that have been taken on professional sports since the 1860s is make physical representations of it. One interesting one I saw, it's probably not interesting to many other people is the number of people named Bob peaked in the 1960s in professional sports.
00:04:56
Speaker
and the people named Bob in professional sports today is at an all-time low. That could show you a lot of things about society as a whole and what they're valuing. That would be studying in a frequentist way, saying that knowing how many Bobs there have been in sports, how many can we predict in the future,
00:05:21
Speaker
Whereas knowing the situation that might cause a bob to be in sports, what can we say about the probability of this person becoming a sporty bob? Yeah, so the frequentist one just shows a downward trend. And the other one would be, while everybody's named their kid after meth puns, we're not going to get another bob in the sports.
00:05:40
Speaker
Yeah, I mean, yeah, it's definitely like the difference between, well, we'll talk about Bayes' theorem right after we talk about random variables, which is coming up next.

Understanding Random Variables

00:05:54
Speaker
OK, so now we're going to talk about random variables. And so a random variable is just like a variable with more than one outcome, really. But the probability has to either be well-defined or implied or something. The proportion of people who wear hats on a given Tuesday is a random variable. And for example, a die being rolled that is represented by a random variable.
00:06:21
Speaker
And I mean, you can think of all sorts of things that are represented. I mean, sports, I mean, XKZD's called, we just came up with a way to, a way to number generator. They're like, let's develop narratives about it. Yeah. The focus on sports, on at-bat, in baseball, there's only a certain amount of possible outcomes, but you can't predict which outcome it would be. You can eventually figure out the probability of that, but
00:06:49
Speaker
Yeah, and figuring out the probability, though, is actually very distinct in statistics from the true probability. Well, I have a question, actually, then. OK, cool. So a die is, or a coin flip. That's a fixed probability, I'm assuming, right? Yeah, 50-50, or like 1-6 on each side. But as the example of people who wear hats on Tuesday or on a given Tuesday, there's no real hard and fast rule how many people that
00:07:16
Speaker
No, but there are a number of outcomes because either people are not wearing hats, right? So this random variable represented because we have a fixed number of outcomes.
00:07:33
Speaker
I guess the probability doesn't really have to be defined. The probability in the true universe is defined in so many different ways. For example, let's say that it's going to rain. That is a random variable that is not independent from that variable because more hats are used during rainy times.
00:07:54
Speaker
So, I mean, you could probably think of two random variables in sports that are correlated, like probably like height and something else. Yeah, so we'll go back to baseball. If a pitcher is left-handed, they're likely to put in a left-handed batter.
00:08:14
Speaker
Yeah, exactly. The probability of a left-handed batter is not independent of the probability of a left-handed pitcher. So those are two random variables.

Probability Spaces Explained

00:08:23
Speaker
The random variable is probability that random pitcher on this event is going to be this, or not just event that this literal instant, you know? Because truly there is no randomness except for like really the quant...
00:08:39
Speaker
quantum levels, so really we have essentially randomness on Earth. So let's actually just take a look at how a random variable is defined. First before we talk about exactly what a random variable is mathematically, we have to talk about what a probability space is. And so first of all, there is the sample space or the set of all outcomes. So everything that could possibly happen
00:09:06
Speaker
So for example, let's say that the probability space is a day, okay? Anything that happens during that day that's any outcome at all is part of the Omega. Okay. So all results that happened during that day are part of Omega.
00:09:25
Speaker
Yeah, and then we define a sigma algebra F on omega. These are called events. And events are one or more outcomes happening. So basically, a sigma algebra is a set of sets, closed under complement for the sample space, and closed under countable unions. What that means is that in the sigma algebra F, if we have everything happens, we also have to have nothing happens in there, yeah. So it also requires the definition of a probability measure.
00:09:54
Speaker
And the probability measure tells us how probable it is for those outcomes to happen, for the events to happen. So for example, let's say we're doing a coin flip, right? Yes. I'm familiar with coins. We have heads and we have tails, right? Yes. So our Omega is heads and tails. So the probability of heads happening and tails not happening is 50%, right? Yes. And tails and not heads is 50%, right? Yes.
00:10:23
Speaker
But the probability of heads or tails happening, which is the probability of our entire omega, is 1. And that's an important probability space. At all the events, it's 1. So that means that that's how we define the probability space.
00:10:39
Speaker
Cool. And so that is just a probability space. But then you have a random variable, which is... Okay, so like any event that happens, right? That's an F in the probability space. So what a random variable does is for each event that could happen in a probability space, it assigns it a value.
00:11:01
Speaker
So, for example, we have the six sides of the cube, so if we label them 1 through 6, those labels would be pretty much in the probability space. However, what we interpret about those labels, their numerical value, 1, 2, 3, 4, 5, or 6, is defined by a random variable. So a random variable pretty much is a function from a probability space to anything else.
00:11:25
Speaker
So outside where we're recording right now, there is a light. Every 30 seconds, the light turns red. And that light is always going to turn red. The probability of that happening is one. What happens during that red light is the variable. How many cars stop at that red light? Were the colors of those cars? How many people are in the cars? And so on.
00:11:49
Speaker
Yeah, all those are random variables. And they would all be events in our probability space too if we're doing one day in the life of Earth. And you have two different types of very random variables.

Exploring Random Variables

00:12:02
Speaker
You have discrete, which is like dice or heads or tails. Or like, is this person wearing a hat or aren't they? Or are there one, two, three, four, or five ties in the shop? Or like, what combination of marbles will I pick out if I pick out 15 marbles out of random collection of marbles?
00:12:18
Speaker
These are more concrete examples that are easier to wrap your head around. Oh, yeah. But it's important to look at random variables as anything random, because there's also continuous random variables, such as when you throw a dart and it hits a dartboard in a random position. OK. Because that's continuous, right? There's an infinite amount of places that dart can hit. Yeah. And then the velocity of gas particles is another example. OK. Now I'm interested.
00:12:43
Speaker
Yeah, because there's a root mean squared speed, which is related, I think it's directly related to the temperature. Or it might be the square root of the temperature, but it basically says how fast molecules are moving in a gas. And it was discovered in a really cool way.
00:13:05
Speaker
There's actually a book that I'm writing as a side project as well as doing Breaking Math. I'll more about that later on some other episode, but there's a whole chapter in it about how this... It's a cool proof. Maybe we'll do an episode on it one day where you take arrows going from the center of a shell that represent the
00:13:29
Speaker
velocity in all those different directions. And then you look at a shell of arrows, and then you look at the density of the shell of arrows. And you say, how many arrows are dense in this place? But there's an infinite amount of arrows. And anyway, basically,
00:13:45
Speaker
weird finhackling you get a probability distribution that is kind of smooth but like it is smooth but like it's uh it's it is smooth but it's not like the normal distribution which we'll talk about in a second where it's like a bell curve because it's not a bell curve because we're just talking about the speed right so um
00:14:06
Speaker
And does this work for spray paint? Depending on the temperature and other factors, when you press that thing, the stuff flowing through the air reacts differently. Yeah, that could be it. Yeah, definitely. All those different particles hitting the place. If there's any uncertainty, there's around the variable to talk about it.

Distributions and Phenomena

00:14:30
Speaker
Distributions are a way of looking at a random variable and seeing how probable a certain event is. So there are some continuous random distributions and there are discrete ones. But one example is height. Height is actually like, it has the standard distribution normally. And the standard distribution is what happens when you have a bunch of random variables interacting.
00:14:59
Speaker
Specifically summed together, like the more random variables you have, the more likely it is that it'll be a standard distribution, which is the fundamental theorem of statistics. So if you take the height of everybody, let's say every man in the United States, the average height may be somewhere around 5'10", 5'11", but there'll still be outliers. You might get the odd 7'2 person.
00:15:26
Speaker
Yeah, and everybody's probably seen the bell curve. It's the one that it starts out near zero, and it has a hump in the middle, and then it goes back to zero. So the hump means is that, let's say that it was called centered at, let's say the average height is 510, and let's say the standard deviation is three, it's what it's called.
00:15:52
Speaker
So that would make a bell curve of a certain shape. And what the standard deviation is, it's a way of measuring how varied the heights are. So the wider the bell curve is, the more different the heights are. Because let's say you want to see how probable it is to be like between average height and 2 inches above average height, or let's say 3 inches above average height.
00:16:20
Speaker
you would have to look at the area below the curve between 0 and 3. And since it's a standard distribution, that would be 34%, because there's this thing called the 68, 95, 99.7 rule. OK. What is the 68, 95.95 rule? It's a 68, 95, 99.7. 99.7. I'm sorry.
00:16:47
Speaker
And what it is is it's saying that in a normal distribution, there's going to be 68% between 1 standard deviation below and 1 standard deviation above, 95% for 2 standard deviations on either side, and 99.7% for 3%. I think I understand.
00:17:05
Speaker
So for height, the same thing works with, for example, IQ is a really stupid test because all it does is measure certain types of very specific knowledge. But on any IQ test, if you want to see how many people score, for example, between 85 and 115 because the standard deviation of IQ is typically set at or is usually like 15 points,
00:17:32
Speaker
So it's like 68% because it's negative 1 to 1 standard deviation and so on. And then you have the Poisson distribution. And that's to measure things like radioactive decay. Poisson as in fish? Yeah, it's a guy named Poisson. Oh, a guy named Poisson.
00:17:50
Speaker
Yeah, I didn't know him and fish until recently. But like, for example, radioactive decay, right? So the Poisson distribution measures the amount of events happening randomly in a fixed space, usually. It's used to approximate that. And usually it works. Like in radioactive decay, it works. So it's like, what's the probability that 10 events are going to happen? Like we said, it gets less and less. It gets less and less.
00:18:16
Speaker
So even in this randomness within the scale you can see some kind of pattern. Oh, yeah, and For example, let's say like somebody runs a business and is a call center Yes, the amount of calls that they receive an hour can be represented by a plus on distribution or in biology the number of mutations on a strand of DNA or like like it just happens when you're measuring the number of events happening in a fixed period of time and
00:18:42
Speaker
Okay, so a call center may get 10 calls in an hour, or it may get 100 calls in an hour, but the higher volume is going to get lower. On the other side, does the lower volume also go down? It starts either pretty high like where it is, and it goes on to infinity, but each step is going down a little bit. Okay.
00:19:07
Speaker
and it goes down pretty much exponentially. There are some Poisson distributions where it goes up first, kind of like a little hill, then it goes back down. But yeah, there's all kinds of assumptions that you have to make using a Poisson distribution. Like with any distribution, there's a certain character that the distribution is used for. The point apart is that the average rate at which the events occur has to be constant.
00:19:33
Speaker
So that's why radioactive decay is such a good example. Yeah, because there's a certain amount of time that's going to happen. Or in the call center example, we can choose an hour or a day.
00:19:42
Speaker
But what I mean was the radioactive thing, like if it's, okay, if it's like an isotope that takes a long time to decay, like there's some that takes, I think like 80,000 years or more. Like carbon. Yeah, exactly. So if you look at the amount of like alpha particle emissions from carbon, you will, you'll notice at any given time, it'll be modeled by a plus on distribution, but you can't really model long periods of time using that. You'd have to use more complicated distribution.
00:20:12
Speaker
I think I understand. All right, so now we're going to look at Bayes' theorem.

Bayes' Theorem in Practice

00:20:20
Speaker
So have you heard of Bayes' theorem? I have not. Okay, so there's a thing called conditional probability. So probability of something given something else. So the probability that
00:20:33
Speaker
somebody will sleep 20 hours a day given that they are a cat, is different than the probability that somebody will sleep all day in general. So it's probability of A given B. Yeah. So going back to one of the car examples, we can say the likelihood of a convertible stopping at a red light is 1. And conditionally, if it's raining, the likelihood of a convertible stopping at a red light is
00:21:01
Speaker
Yeah, exactly. The probability of this convertible stopping at a red light given the fact that it is raining. And then the way that it's written is p parentheses. And you'd say something like car stops and then put a vertical bar, a pipe. And then it is raining in parentheses.
00:21:22
Speaker
Cool. And so what's interesting is that Bayes looked at something and he realized that the probability of A occurring given that B has already occurred is equal to the probability of A and B both having occurred divided by the probability that B has occurred.
00:21:40
Speaker
That seems to make sense, just as you said. Yeah, right. Because if a and b both occur, you know that b has occurred. So if you divide it by the probability that b has occurred, you're amplifying it, basically. So for example, let's just do a quick example.
00:21:59
Speaker
So let's suppose that one quarter of clouds can be described as both raining and having been seeded. So the probability of it is raining and the cloud has been seeded is one quarter. Now let's suppose the probability of the cloud raining at all is three quarters, so that would be measured by three quarters of the clouds raining at any given time.
00:22:21
Speaker
That would mean that if we divide the top one by the bottom one, the probability that it is raining and the cloud has been seeded, divided by the probability that it is raining, we will get the probability that the cloud has been seeded, given that it is raining. And that is one-third. So yeah, it could be thought of as a ratio. And what's weird is that there are some weird things that you discover about this approach.
00:22:46
Speaker
So let's say there's a test for some deadly disease, right? Yes. And there's a test for this disease that only gives false positives or false negatives one-tenth of the time. Okay. So it's 90% accurate on both sides. So let's say that, let's let some random variable A represent the likelihood that someone is sick and B mean being random variable for the likelihood that somebody tests positive for the illness, right? Okay.
00:23:14
Speaker
So PB is equal to the probability of a true positive times the likelihood that someone is sick plus the probability of false positive times the likelihood that someone is not sick. Because PB is saying that you just test positive at all, right? So you have to count the true positives plus the false positives.
00:23:40
Speaker
So that means if you work out the math that there's an 18% chance that someone will test positive with no other information knowing about them, right? Because we know that one tenth of people have the disease, right? So the probability is one tenth. So we count the nine tenths that actually have the disease times the... Yeah, it makes sense, right? Yes.
00:24:08
Speaker
So 90% of the people with the disease, the test will be accurate. 10% it will not. 90% of the people who don't have the disease, 90% of it will actually assess negative. 10% of that will also not.
00:24:23
Speaker
Yeah, so we're seeing that it's still, most of the time, 90% of the time, it's accurate for people who have the disease. So if we know that person has a disease, 90% accuracy. So P of A and B, which is the probability of the likelihood that somebody is sick, actually sick, times nine-tenths of the time that it's a true positive.
00:24:45
Speaker
So you multiply them together, right? Because they're independent random variables. Okay. So that is nine tenths? Nine tenths times one tenth. Nine tenths times one tenth. Because it's 90 percent chance that they are actually sick with this. Ten percent chance that they're actually sick with this. And 90 percent chance that they were tested positive. And then the result of that is nine percent.
00:25:11
Speaker
Yeah, so now we divide it, 9% divided by 18%, we get 50%. So that means that is P of, that would be P of B given A in the formula. And so what that is saying is that if somebody tests positive, there's only a 50% chance that they're actually sick.
00:25:31
Speaker
Even though at the beginning of this, I thought this test is great. I'm not so sure about the test. I'm not even sure about the disease. I'm worried. Sophia, now I'm worried. Worry. It's going to get you. But yeah, so literally this test is completely useless for like working with a big population, obviously.
00:25:50
Speaker
Yeah. And it's really fun to apply this in just random things. I don't know. You'll find opportunity to use this in your life. And another way of stating this is that the probability of A occurring given B times the probability of B occurred is equal to the probability of B given A times the probability that A occurred. OK. I'm going to write that down.
00:26:19
Speaker
It is nice and symmetric. It is nice and symmetric. Let me get that tattooed. Okay, so Lindley's paradox has to do with the difference between Bayesian and frequentist statistics.

Comparing Statistical Approaches

00:26:35
Speaker
So under the frequentist approach, you're really looking at the
00:26:41
Speaker
how well an explanation fits an observation, like the absolute amount, versus the Bayesian approach, which is saying, is the observation better than the alternative observation? So if you have the null hypothesis, and the frequentist is really asking, it's saying, it's H null, a poor explanation or a good explanation for the observation, versus the Bayesian approach saying,
00:27:08
Speaker
Is H0, is H null a better approximation than H1? And sometimes it can be by a lot and sometimes by not as much. Can you give an example of this? Yeah, sure. So the Wikipedia article has the example of you have a city where you have 49,581 boys and 48,870 girls, and they've been born over a certain time period. I hope they've been born.
00:27:35
Speaker
The observed proportion of male births is 0.5036. And so we assume that it's a binomial variable. And a binomial variable is just like... An example of a binomial variable is like flipping a coin, the number of heads and tails. So it's like a success of yes-no trials. So assuming these births are binomial.
00:28:01
Speaker
Yeah. And so the hypothesis is that the binomial variable has a parameter called theta, which is saying like, what is the chances? So theta for a coin would be 0.5. Yes. Theta for a coin that lands on heads twice versus tails once, like some kind of weirdly weighted magic coin. So if you have more weight on the head side as opposed to the tail side, it would be 0.67 and 0.3.
00:28:29
Speaker
No, it'd be one-third and two-thirds. That is 67 to 100, so you just said. Oh. Just kidding.
00:28:34
Speaker
You cannot use approximations on a math podcast. I'm sorry. I'm so sorry. I made a faux pas. So we're trying to see if there's a strange number of male births in the city. So the null hypothesis is that there's not, and that theta equals 0.5. The alternative is that theta is not equal to 0.5.
00:28:59
Speaker
And basically through some math, you find the p-value and you use a normal approximation if you want to. So you see that the p-value would be 0.0235. But the Bayesian approach, which is you set the prior probabilities
00:29:18
Speaker
to 0.5, so H null equals each thing is like half true basically. You kind of split the difference and then you continue from there. It's a way to use Bayes' theorem. Using some kind of complicated stuff, we see that the p-value in this case is 0.95 versus 0.0235. And because that is saying that H0 is a better explanation than H1.
00:29:48
Speaker
So it seems like the Bayesian approach predicts that it's going to be way more likely that H0 is than H1, right? Yes. However, if we look at the significance levels, which can be calculated, it keeps going down and down for the Bayesian approach. And it converges to 0, which is consistent with rejection of the null hypothesis. So it's just that you have to be careful when you use Bayesian approaches. You can get some outcomes that seem kind of out there.
00:30:18
Speaker
Yeah, but it's important to know both approaches because a frequentist approach is saying something like, what is the absolute probability that H null is true? We'd calculate the p-value. And so H null, of course, being that the proportion of boys to girls is one half or one to two.
00:30:37
Speaker
However, the Bayesian approach is saying, is H null more probable than H1? It's saying, is it more probable that the null hypothesis is true than H1? So we're literally comparing hypotheses in that case.
00:30:52
Speaker
And we get a different result when we're working with really hair-splitting differences like that. And Bayesian networks are used a lot in artificial intelligence and just making Bayesian networks for automated decision networking and things like that. And you always make a lot of assumptions. For example, you assume sometimes always a standard distribution. So it's abductive reasoning versus deductive reasoning.

The Impact of Statistics and Probability

00:31:21
Speaker
Deductive reasoning means you get only one possible outcome. Abductive reasoning is you choose the best outcome with the information you have. I believe so.
00:31:35
Speaker
Statistics and probability may be counterintuitive, but they underline not only the structure of the microscopic world, but their effects are found at any scale where generalizations can be made about things that we are not certain about. We've explored different statistical distributions, the concept of random variables, and ways of looking at probability.
00:31:52
Speaker
So whether you are an analyst in government or insurance, or even just playing cards at home, probability statistics are sure to be found. This is Sophia, and this has been Breaking Math. With us we head on Matt, and Matt, thanks for being on the show. Thank you for having me again. And is there anything you'd like to play? Yeah, I make guitar amps, two amp style guitar amps, and I just set up a website called LoudHailerAmps.com. And how do you spell that?
00:32:27
Speaker
Right now it's just a placeholder, but hopefully soon there'll be more there. Is that the web domain that you bought right before we record the episode? That was the web domain I bought right before we started recording. All right, so you're witnessing, I guess, history, anything's history in the making, right?