Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
3: TMI (Information Theory) image

3: TMI (Information Theory)

Breaking Math Podcast
Avatar
1.6k Plays8 years ago
“ABABABABABABABAB”. How much information was that? You may say “sixteen letters worth”, but is that the true answer? You could describe what you just read as “AB 8 times”, and save a bunch of characters, and yet have the same information. But what is information in the context of mathematics? The answer is nothing short of miraculous; information theory has applications in telephony, human language, and even physics. So what is information theory, and what can we learn from it?

---

Support this podcast: https://anchor.fm/breakingmathpodcast/support
Recommended
Transcript

Ads & Promos

00:00:00
Speaker
With Lucky Land slots, you can get Lucky just about anywhere. Daily beloved, we are gathered here today to... Has anyone seen the bride and groom? Sorry, sorry, we're here. We were getting Lucky in the limo when we lost track of time. No, Lucky Land Casino, with cash prizes that add up quicker than a guest registry. In that case, I pronounce you Lucky.
00:00:21
Speaker
Play for free at LuckyLandSlots.com. Daily bonuses are waiting. No purchase necessary. Boyd were prohibited by law. 18 plus. Terms and conditions apply. See website for details.
00:00:30
Speaker
Hello, it is Ryan and I was on a flight the other day playing one of my favorite social spin slot games on ChumbaCasino.com. I looked over the person sitting next to me and you know what they were doing? They're also playing Chumba Casino. Coincidence? I think not. Everybody's loving having fun with it. Chumba Casino is home to hundreds of casino style games that you can play for free anytime, anywhere, even at 30,000 feet. So sign up now at ChumbaCasino.com to claim your free welcome bonus. That's ChumbaCasino.com and live the Chumba life.

Licensing & New Shows

00:00:59
Speaker
This episode is distributed under a Creative Commons Attribution Share-alike 4.0 international license. For more information, visit creativecommons.org.
00:01:10
Speaker
This is Gabrielle Hesh from the Breaking Math podcast. I am thrilled to announce that we have teamed up with a particle physicist and a science fiction author named Dr. Alex Alaniz to create a brand new science show on YouTube. Our new show is called the Touring Rabbit Holes podcast. It's named after the mathematician and father of modern computer science, Alan Turing. You can find us at youtube.com slash touring rabbit holes podcast.
00:01:35
Speaker
and that's spelled T-U-R-I-N-G rabbit holes podcast. Be sure to like and subscribe for great visual storytelling about the history of mathematics, physics, and computer science and the impact that they have had on society. Again, that's youtube.com slash touring rabbit holes podcast. And now back to the breaking math podcast.

Website Updates & Introduction to Information Theory

00:01:57
Speaker
Somebody stole our website. Oh no, whatever shall we do?
00:02:01
Speaker
I mean, I guess you could go to the new website, http colon slash slash breaking math podcast dot a P P with no www for all you old timers. So breaking math podcast dot app. I mean, if you're into that sort of thing.
00:02:22
Speaker
A B A B A B A B A B A B A B How much information was that?
00:02:37
Speaker
You may say 16 letters worth, but is that the true answer? You could describe what you just heard as AB 8 times, and save a bunch of characters, and yet have the same information. But what is information in the context of mathematics? The answer is nothing short of miraculous.

Foundations of Information Theory

00:02:56
Speaker
Information theory has applications in telephony, human language, and even physics. So what is information theory and what can we learn from it? All of this and more on this episode of Breaking Math, Episode 3, TMI. I'm Jonathan Baca.
00:03:20
Speaker
And I'm Gabrielle Hesh. And you're listening to Breaking Math. And today with us we have Gideon Dilettante of the Odd. And also we have Hannah Butler, recent math graduate. And today we're talking about information theory. Now, Gabrielle, what is the essence of information theory? Okay, so the essence of information theory is basically saying that everything in the universe can be also described in terms of information, that is, in terms of bits.
00:03:49
Speaker
Exactly. And a bit is a yes or no choice that is fair. That is how much information is contained in one bit. But we have a few constraints that we need to put on information. First of all, we need to have it seem like our intuitive concept of information. That's to say, if we have two pages, it can contain twice as much information as one page, basically.
00:04:14
Speaker
It seems self-evident, but that's a huge constraint. And that's what Claude Shannon, the father of information theory as he's known, designed the formulas around.
00:04:24
Speaker
Yeah. In fact, now we don't hear a whole lot about information theory as a concept. I would say until you're, in my experience, at least a grad student in engineering, well, some of the higher level classes as an undergraduate, you definitely hear it if you're in communications. Now, Claude Shannon is the mathematician and the engineer who is, as you just said, is the founder of information theory. He published a paper and oh goodness, the date escapes me. 1948 of Bell Labs.
00:04:48
Speaker
Oh, very good. He was actually trying to solve a specific problem. How much information can you transmit across a wire? Yes. Because there's an application in telephony where you try to multiplex, that's to say, transmit more than one voice at a time on one wire. He was trying to see how many you could.
00:05:10
Speaker
Yes. So obviously, now this is interesting. One common theme in our Breaking Math podcast, and we're talking about ideas, is what led to the ideas. In this case, of course, it was a specific engineering problem that then led to the concepts and information theory. But then from that, a new way of looking at the universe. And that is, as we will discuss in this episode, looking at the universe in terms of information.

Practical Applications

00:05:37
Speaker
Yes. Now, Hannah, what is your experience with information theory? I don't really have any. So it's not, so it's fair to say it is not part of the main math curriculum then. No, not at all now. And given what we said about how useful it can be, why do you think that might be? I'm not really sure.
00:05:57
Speaker
I have noticed that engineers and mathematicians have a lot of times very different things that they study. Information theory, however, has a very useful theoretical as well as practical aspect.
00:06:12
Speaker
You know, I'm wondering if it's kind of more a matter of tradition. You know, it is a relatively new field. It's not even 100 years old at this point. So it could be that it's just a matter of tradition. Of course, it's a very, very relevant topic and it's a very hot topic, both in physics and engineering. So perhaps that's why some folks haven't heard about it. Now, Gideon, do you have any experience with information theory or information?
00:06:35
Speaker
just a very basic awareness of it. What is this basic awareness, sort of? Just an awareness that information can be broken down into the smallest units and that's about it, really.
00:06:51
Speaker
And let's see what that is. This next section will provide an example that will illustrate for the audience exactly what information theory is. Flip a fair coin and see what it lands on. You now have one bit of information, heads or tails. Flip it again. Every 50-50 outcome gives you another bit of information. This is the true definition of a bit. Now what happens when you have more than two possible outcomes? Say what happens if you have a fair die?
00:07:20
Speaker
In the case of the fair die, you have about 2.7 bits of information per throw. This disparages a common misnotion about the bit that the bit is indivisible. The reason why is this. If I told you I rolled a die, it would take you between two and three yes or no guesses. On average, 2.7 to be precise to guess what I rolled.
00:07:42
Speaker
Well, that's actually a very interesting concept. So obviously I'm an electrical engineer and information theory is very much in my future this semester, in fact, but I honestly haven't done a whole lot. I never knew that you could have a fraction of a single bit of information. So that's actually quite it. But it makes sense the way you explained it with the dice. Yes. And of course, when you're storing information, more or less, you have to have bits indivisible. When you have digital storage, you can't store half a bit on a computer.
00:08:10
Speaker
that makes the okay my familiarity with with information storage so far was based on that and i get it you can't have half a bit on a computer but you can in theory have a fraction of a bit of information precisely now let's look at another scenario what happens when the coin is unfair like what happens if there's two thirds of a chance of a coin flip landing heads in this case there's only 0.92 bits communicated per flip
00:08:36
Speaker
If perhaps it seems a little hard to grasp, then consider another example. Suppose a coin only lands heads up. Each flip, in that case, would give you zero bits of information. The reason why I'd give you no information is because whether it lands heads up or heads down does not give you any new information about the coin flip. You already knew before it was flipped whether or not it'd land heads up. Here's something else to consider. What if you did not note that the coin was unfair?
00:09:04
Speaker
You might guess that it gives you one bit of information per flip anyway. This shows that estimates information can give you a bound on the amount of information, but not necessarily tell you how much information you have. In fact, this will almost always be an upper bound, because the more you know about any system, the more you'll know about the probabilities involved. To put it another way, the more you know about something, the less random it seems, and true randomness has the highest possible information content.
00:09:32
Speaker
Okay. So just to, um, review those concepts, um, one bit of information as we were defined in information theory is the knowledge that you get from a 50 50 choice. Yes. Um, there are other units, there's the net where instead of taking the base two logarithm, you take the base e logarithm, the natural logarithm for a more technical audience. Yeah. Yeah. Okay. I understand. Interesting. Interesting. Okay.
00:09:57
Speaker
and then there's the Hartley where you take base 10. And a bit is also known as a Shannon. Hannah or Gideon, have either of you ever compressed a file?

Data Compression Techniques

00:10:07
Speaker
Yeah, definitely. Yes. Now, as far as information theory goes, could you guess what that does?
00:10:15
Speaker
From what you're saying, I'm taking away that the more certain and less chaotic a piece of information is, then the less information it carries, basically. What compression does in the ideal case, which is impossible, is have the original information represented, the original information content.
00:10:41
Speaker
So even though there might be a million flips, you can compress that down if each flip is not a fair flip. Now we ran a few experiments just to show you what this means in practice. And what we did was we wrote a program that flipped a coin virtually one million times.
00:11:01
Speaker
and the first time it was fair. We had a one megabyte file and when we compressed it, it compressed to one megabyte as you would expect. When the probability that it would land on heads was two-thirds, it compressed to 931 kilobytes. 918 kilobytes would be the perfect average.
00:11:17
Speaker
When the chance was nine-tenths that would land on heads, it compressed all the way down to 562 kilobytes from one megabyte, where 469 kilobytes would be the average. And when we just did one bit, it was less than one kilobyte of data that ended up on the hard drive, where zero would be what we would expect from the formula.
00:11:36
Speaker
All right, so in this next segment, in order to really illustrate a practical application of information theory, we're going to talk about something that I just learned about recently, and that is data compression. And specifically, LZW compression.
00:11:51
Speaker
Yeah, and LZW is named after three engineers. Lig Semple and Welch. Okay, very good, yeah. And they specifically use that, their data compression algorithm is used specifically, I believe, in images currently, is that right? Yeah, I think JPEGs. Okay.
00:12:07
Speaker
So those of you who use Facebook or who use many applications online, you interact with JPEGs, also other forms of images, TIFF images, GIF images, things like that, in a very general sense. When we talk about compression, essentially what this will do is an image on a computer can be broken down into single pixels.
00:12:32
Speaker
Yes, and to explain how LZW compression works, let's take A, B, A, B, the thing I said at the beginning eight times. Oh, that's a great example. Okay. The first symbol will be represented as A. The second symbol will be represented as B.
00:12:46
Speaker
Yes. Now, if you've got a whole sequence of them, you can, of course, as you said, in that opening segment, you can say the entire sequence, which takes a lot of breath, or you can just say AB 16 times. I don't remember what quite went with the... And I'm pretty sure what LZW would do is say AB, then have a symbol that means AB once, and then have another symbol that means ABA, and then BABA. Okay.
00:13:15
Speaker
Now, real quick, so let's say you've got a picture. Let's say that you've got a TIFF image or a JPEG image. You know, you've got a beautiful picture. If you zoom in, it's just broken down into individual pixels that are, you know, red pixels or green pixels, your primary colors. What you can do is you can go through the entire thing. And if you have six green squares in a row, instead of storing all six green squares, as Jonathan said, all you do is you just have an algorithm that says green square times six.
00:13:44
Speaker
And if you think about a typical image of, say, somebody at the beach, the sky is going to be a lot of repeated things. There's going to be a lot of stuff that you could get rid of in the sand because sand looks like noise to the human eye. And another point that we could bring up is the difference between lossy compression and lossless compression, just really quick for the more technical listeners out there. Lossy compression, you lose some of the original information, but enough for human to appreciate it.
00:14:13
Speaker
in lossless compression, which you need for something like a book where you can't have a bunch of gibberish in the middle of it. It represents all of the original information. And of course, the other really cool thing is, as I was thinking about this, again, my mind was blown. I just simply put, I didn't know this. And I'm learning more and more as I delve into these mathematical topics. We

Interactive Examples: Playing 20 Questions

00:14:31
Speaker
do that all the time. Writing is a form of compression.
00:14:34
Speaker
And in fact, one form that everybody in this room probably uses daily is predictive text. If you type in T-E-X on a keyboard, the next letter is probably going to be what, everybody? T. And if you say A-A-G-A, it's more likely that you meant saga. Now, on our next segment, we're going to play 20 questions. The reason why is going to become apparent as we play.
00:15:01
Speaker
Now, Leela, would you kindly go to 20q.com? Yes, of course. Leela is going to read the suggestion. She's going to basically play at the game as we say yes or no to the questions. Okay, so I'm going to pick one of the suggestions. But not tell us. And not tell you. I have chosen
00:15:20
Speaker
All right. So what should our first question be? Oh gosh, this is worth discussing. Okay. Okay. Okay. So who here is good at 20 questions? I have 16 nieces and nephews. That's right. I have a huge family. I have 16 nieces and nephews. So we, so, so 20 questions is something that comes up a lot on, on, on car rides. So obviously the broadest broadest questions possible. Um, now I don't have much experience with bread boxes outside of 20 questions, but that's a traditional first question is a bigger than a bread box.
00:15:49
Speaker
Oh, okay. Actually, you know what? I think that would be fine. So let's say we do a little vote here. Should we just go ahead and say, is it bigger than a bread box? I'm game. Okay. Let's do it. Is it bigger than a bread box?
00:16:01
Speaker
Yes. All right. Okay. Let's see. Should we find the upper bound of it then? And again, I guarantee there's an algorithmic way to do this. Well, the fastest way to do it algorithmically would be to double the thing every time. But because we're talking about colloquial knowledge, the algorithm is going to be a lot more nuanced. That's not so related to information theory as it is to computing. You all caught that, right? Just kidding. Well, the website recommends question one as, is it classified as animal, vegetable, or mineral?
00:16:31
Speaker
Oh, okay. Well, we did the bread box one. Should we do that? Should we go ahead and go with the suggestion? Yeah. Okay. Okay. We're not going to take any more suggestions from the website. Okay. That's a one-off. Is everybody cool with that? I'm cool. Yeah. I'm cool. All right. Is it animal, vegetable, or ... Oh, wait. It has to be a yes or no question. I feel like the animal would be the most likely most of the time. Should we say, is it living or dead? Yes.
00:17:01
Speaker
Is it living or dead? Is it a living? It is living. Yes. All right. Okay. So it's living and it's bigger than a bread box. So that's a lot of options. I know. I know. Wait, wait, wait, wait. Let's catch up. So are we, we're at two questions or are we at two questions? Okay. We are at two questions right now. Narrowing it down. All right. So it's alive. Um, any suggestions for the next question? Okay. We know it's alive. What will narrow it down? We can ask how many legs it has.
00:17:31
Speaker
Most animals have four legs though. We don't know that it's an animal though. Yeah. We could be a plant or it could be an insect. That's all we know. So obviously the four legs, I think that would rule out most insects that would rule out, uh, I think let's be specific here. It would also rule out fins. But most insects are smaller than a bread box. Oh, good call. We've already, okay. Okay. Okay. Guys, this is hard. This is hard. Okay. I wish we could have listeners call in. I vote for is it a plant?
00:17:59
Speaker
Uh, okay. I don't have an, uh, a competing alternative. Is it a plant? No. All right. So it's an animal. Okay. So yeah, now we are. Well, an animal, are we ruling out insects yet? Insects are animals.
00:18:14
Speaker
Okay, so I have an embarrassingly low recall of my biology. My sophomore year in high school was the last time I took biology. Oh. It had no shame on me. I got way into physics and math and I left my biology behind. All right, so maybe we should ask if it's domesticated or wild. Well, again, I think the strategy here is the broadest questions where we can rule out the most information. I feel like something like, is it a mammal?
00:18:40
Speaker
And you think that I like that one and it would rule out about half of the things Okay. All right. Is it a mammal? I don't know
00:18:48
Speaker
Oh no, good one. So I guess Google to the rescue. Let's check if it's not that easy. Okay. So this is something that I don't know if it even falls into those categories. In that case, I feel like is it a water animal or is it a land animal? Now for our listeners, we just got a little bit more information than yes or no.
00:19:12
Speaker
But that's important to note, just because now we can't say that we got three bits of information. We could say maybe we got, who knows, 4.5, who knows. This is fascinating. This whole information theory, it's like solving a mystery. It is. It seems like not knowing whether it's a mammal or not should narrow it down kind of a lot though. Yeah, because it can't be an elephant, can't be... Ooh, should we ask if it has eyes?
00:19:37
Speaker
Um, you know, again, I'm just so lost here. I am as lost as you are. So we were all lost. As you know, like if this is a high pressure situation, you know, like, I think I would ask about eyes gone to our head because we got it pretty narrowed down already. We have like another question. That's a good one. Does it have eyes? Yes. Oh, it has eyes. All right. So it has eyes and we don't know what was the one that we don't know. I forgot. We don't know if it's a mammal. Oh, oh, okay. Okay. Well, maybe we ask if it lives on land.
00:20:07
Speaker
Um, okay. And that being question six, right? Yes. We need like some, some scary, can we have some like intense music during this part of the pot? Oh yeah. Cue the intense music. High stakes here. High stakes. I think we've got some virtual high fives here at stake. I want some virtual high fives. All right. So has eyes. So it has to be kind of a normal animal.
00:20:33
Speaker
Unless it's not. Unless it's not. Oh, could it be mythical? Um, that's kind of a one-off. Is that a little out of options?
00:20:42
Speaker
Um, it rules out dragons and phoenixes. You know, you've been really leading the charge here. All right. You lead the charge. Oh, no, no. I was going to say, I'm, I'm, I think I'm okay with that. With that question, I can't rule it out. You know, since it's our survival, that's at stake. I'm just kidding. Since it's our pride that's at stake. Are we okay with saying, uh, is it mythical? Yeah, no, I think that's a great idea. You realize the stakes here. Okay. Okay.
00:21:09
Speaker
We literally have dozens of seconds of shame at stake. I know. Now, Leela, is it mythical? Some people think it's mythical. I think that it's real. All right. So, yes. Okay. Now, how much do we know about Leela? Because that also contributes to our information about this answer too.
00:21:27
Speaker
Yes. And of course, right now we just got a little bit more than one bit of information. Yeah. And of course we could have been more specific about how many people know that it's mythical, you know, you know, like, well, ooh, I know. Let's call it mythical. All right. Now let's see what mythical creatures do we know? Griffin, Bigfoot.
00:21:46
Speaker
Yeah, definitely something that can fall in between mammal and non-mammal. Fairies for sure, unicorns, what else is maybe a mammal? Hey, how about that Mexican one? What do you call it? Chupacabra. That's from a very funny show online. So, should we ask if it has hair or fur? I think so. Should we go for fur? Well, okay, dragons don't have fur. We're saying fairies have hair and skin, not fur.
00:22:12
Speaker
Is it mostly covered in fur? What do you think about this question? Is it mostly covered in fur?
00:22:27
Speaker
Okay. Oh, wow. Okay. So now I don't, I hope I wasn't premature in saying it's definitely mythical. Our answer was some people say so, you know, so maybe that, but if there's any doubt about it being mythical or not, okay. Lots of mythical animals fly.
00:22:43
Speaker
Okay. So let's see. Should we say, wait, does maybe prehistoric counts as maybe mythical, but that doesn't sound correct to me. Okay. Okay. Good question. Goodness. So, um, can we name some things? Um, let's see. Oh, fairies. Okay. Okay. Fairies. Um, dragons. Uh, good. Yeah. Dragons. Uh, fairies. Um, it's bigger than a bread box though. Uh, some berries are four feet tall, according to mythology.
00:23:09
Speaker
Oh, okay. Okay. Uh, but always, you know, she, it was a definite answer. It wasn't an always answer or, you know, sometimes answer. It was, it was an all that was a definitive one. But if, if it's a prototypical example of a fairy in her mind that happens to be more than four feet tall, then it would be a definite. Yes. Okay. You know what? Okay. I'm feeling dragons and less trolls or what are some Harry Potter things like, um, an orc, a Phoenix. Hypogriff. Uh, Hypogriff.
00:23:35
Speaker
Let's see, do a lot of these have feathers? Only one has feathers. It's like a basilich. Pegasus. Basilich, I like that. Yep, yep, I like it. What else? Pegasus, should we say, does it have two or more legs?
00:23:47
Speaker
You know what? Okay, so here's here's here's the thing. Yeah. Yeah, we're gonna trolls and now I kind of like the does it fly? Okay, we're gonna do it does it fly no not on its own Okay, so so I'm gonna say no to dragons. Yeah, let me let me Okay, trade it possibly trolls possibly work no to Phoenix. Oh, no, I like Phoenix
00:24:11
Speaker
However, we got an additional little bit of information. Okay. Not literal a bit, but a little piece of information. We did. It doesn't fly on its own. Okay. Maybe it takes American Airlines. Yes. Snakes on a plane. In that case. Okay. Hippogriff. So we're going to say no to that. So it could be the Basilix. Now is Basilix only Harry Potter lore? No, that's a thing. Oh yeah, that's a thing.
00:24:33
Speaker
Okay. Pegasus. Now, Pegasus, they fly? They're covered in fur, though. They totally fly. Okay. Man, and fairies fly. That would be sad if you had a fairy that didn't fly, wouldn't it? They call it a sprite. How about a gremlin? Gremlins sneak onto planes and mess everything up. But are they bigger than bread boxes? I think so. I've never seen a bread box. I've never seen a bread box. I've never seen a bread box. Okay. Okay. So, you know, if this were... Oh, gosh. How many questions have we asked? We've asked six.
00:25:03
Speaker
I think more than six. No, that's six. We'll go with six. Yeah, we'll go with six. I should keep trying. Six. All right. This is interesting. Okay. So it's mythical. Again, it doesn't fly and it's not covered in fur. What else? Okay. I honestly think that we should have a quick huddle. I'm glad we're not timed. What are some other mythical creatures? Help us out here. I'm telling you, Grimlin, maybe. A big Grimlin. Okay. A big Grimlin. Let's see what else flies. A lot of these are humanoid.
00:25:31
Speaker
I know, right? Uh, hobbit. Don't forget. I said, well, I don't know if I should give away too much. No, don't. Okay. Because we need 20 bits of information. Where are you? You can have a little bit more than that. Okay. Okay. What else? Okay. I'm trying, I'm trying to help stop hobbit. Is that only Lord of the rings or is that a common thing?
00:25:50
Speaker
I think it's Lord of the Rings. I think they're called halflings when they're not part of the Lord of the Rings. Okay, then who runs the banks in Harry Potter? What are those things called? Goblins? Goblins, okay. They're small-ish. No, do goblins sometimes fly? Not on their own. That was very cryptic.
00:26:10
Speaker
Okay. All right. What about, um, okay. Dwarves like Santa's dwarves or Snow White and the seven dwarves or the dwarves from Lord of the Rings. Okay. Should we ask if brownies too? I don't know what a brownie brownie brownies. I don't think they're bigger than bread boxes. Okay. Do we, should we ask if it's humanoid? I think so. Yeah. Okay. Is it humanoid?
00:26:33
Speaker
Yes. Sweet. Okay. Beautiful. Beautiful. Okay. So, so now interesting questions. Is it taller than human, shorter than human? That'd be kind of good. I think that would be a phenomenal question. What do y'all think? I think you should go, is it shorter than a human? It could be giant. That'll tell us real quick. Yeah. Is it shorter than a human?
00:26:52
Speaker
It can be. Oh dear. I don't want to give out too much information. Okay. Do not. Okay. So, so let's, let's. But if I say something like it can be that is just, I'm just curious, is that one bit of information? That's a little bit more because a bit of information would be yes or no. Yes or no. And so I want to try and answer these questions with yes or no, but some of the questions that you're asking, I can't. So I'll just say, which is important. Apply.
00:27:19
Speaker
It was just important to note for natural language that a lot of times there's fractional bits. In fact, in Shannon's 1951 paper on the entropy of printed English, there is an estimate of about, I think it's around two bits per letter, which is less than you would expect. There's 26 letters and log base two of 26, which tells you how many bits that would be on average would be 4.7.
00:27:47
Speaker
So it shows that English letters are like an unfair coin flip, basically. Oh, wow. Interesting. Wow, that's just crazy. The way that he did it was this. He had text and he had people guess the letter. And if they could guess it correct, then they counted it. And he applied his formula list to that. Well, let's get back to the game.
00:28:07
Speaker
Okay. That was a fun tangent though. I really enjoyed it. Okay. So back to the game. We know it's humanoid and we know it's mythical and we know that can be shorter. So let me go ahead and do my list one more time. And here's our current list. Um, it's shorter than human. It can be. That's all we have. I think an elf. Okay. Uh, well, what are the elves that are shorter than humans?
00:28:28
Speaker
A lot of elves from Dungeons and Dragons. Because in Lord of the Rings, they are like angelic humans. That would be the evolution of mythology. It could be elves, you're right. It very well could be. What else? I'd like to add aliens to the list if you guys agree with that.
00:28:49
Speaker
Uh, that's a, that's a really good one because it can be mythical. You're right. Aliens. I mean, although true aliens, I mean, aliens must exist based on probability, talking about probability. I'm feeling like we take some stabs here at, at, at some of these questions. Should we, should we go for a, should we go for, is it an elf or is it an alien?
00:29:06
Speaker
That will give us less than one bit of information. The reason why is because let's say there's 10 things. If we have a 9 out of 10 possibility of getting it right or wrong, then we get 0.4 bits of information. So maybe if we can split this list up into two categories and then ask them. Binary search is the fastest way of searching with bits.
00:29:26
Speaker
sure absolutely okay so let me just do i just these are all the things so i've got elf i've got aliens i have the ones in italics that that i italicized we've already ruled out it's not dragons trolls trolls are humanoid right but they're all they're bigger than him okay so we're ruling out trolls orc they're bigger than humans are they yeah they're uh in in dnd they're like seven feet tall
00:29:47
Speaker
Oh, I didn't know that. Okay, so not a work, not a work. Okay, very good. Then after work, we have Gremlin. I think we still have, I doubt it. I don't think it's Gremlin. How about do they live on an Earth like actually that only rule out aliens? Okay, that's a good question. Okay, so here real quick. Gremlin, Hobbit. Do we want to include Hobbit in this? Hobbits are always shorter. Oh, okay. But yeah, and we have, they can be shorter. So I'm gonna include Hobbits. But they can be, not they are.
00:30:15
Speaker
Oh, thank you. Good call. Not hobbits. Okay. Goblins are shorter always, right? I don't know. I don't know what really good goblins are. Dwarves are always shorter. Cool. So we've got three things, elves, aliens, and gremlins. We could ask a simple guess- Wait, gremlins are tiny, right? They're smaller than humans. Right. We've got elves and aliens. That's only two things right now. Well, let's see. We'll split up the list. Does it live in an earth-like realm?
00:30:43
Speaker
Because if it's Earth-like, it could be an alien from an Earth-like realm. Yeah, okay. It doesn't necessarily live on an Earth-like realm. May I ask, could we just say, is it alien? Is it alien? Yeah. But then an elf might be alien. Middle Earth elves. Oh my gosh. Oh my gosh. Well, okay. There's no way to splice this. This is an ambiguous murky area because the realm of fiction is so encompassing that I don't think there's a good way to differentiate.
00:31:12
Speaker
All right, let's just say, does it live on Earth or on a place based on Earth? Okay. Because if it's based on Earth, it can't be an alien. Okay, with that understanding, if it's Earth-based, like Middle Earth, right? Yeah, Middle Earth counts as Earth. Okay, okay. I get where we're going. What do you guys think? Any objections? That sounds good to me. I can't think of myself. Yeah, I can't think of anything other than that. Am I supposed to think of some eye-rolling going on? I'm just kidding. Okay, all right.
00:31:42
Speaker
Does it live on an Earth-like realm or a realm based on Earth where Middle Earth from Lord of the Rings will be considered Earth-like in this context? No. Oh, so I'm thinking aliens. Should we shoot for it? Let's do it. Okay, I'm gonna ask. I'm gonna hit the red button here. Is it an alien? Yes.
00:32:07
Speaker
Awesome. Awesome. Okay. We were going to have to go back and honestly count how many that was. Was that even 20? That was eight. Yes. Oh, and I want to point something out for our listeners as far as information theory goes, but the kind of answers that Lila was giving us, we had a little bit more than eight bits of information to work with. I'm guessing it was probably around 15.
00:32:31
Speaker
because if you look at the sliding scale, that's just my intuition. I don't have any formal math to back that up. However, if we did 20 questions, because we have more than one bit of information per question, we can do more than two to, let's say we had 30 bits total. That could be up to one billion different things. Wow. That's a lot. Yeah. 20 questions. Okay. Yes. And yeah, 20 questions where each question isn't a strict yes or no. Wow.

Information in DNA and Evolution

00:32:59
Speaker
In our daily lives, we're most used to data being stored in magnetic disks or in flip-flops to be used by digital machines. Another, perhaps more covert, place where information is kept is in the form of DNA. In DNA, you have four possible letters, A, C, G, and T, and so you have around two bits of information per letter.
00:33:22
Speaker
The evolution of various species and the phenomena inherent in evolution, including sexual selection, convergent and divergent evolution, and other phenomena are intrinsic to information theory. The information contained in DNA is a simple message. It describes how to survive in the environment the species is given. So, one example of how information theory relates to DNA is in the creation of new information.
00:33:49
Speaker
And I think we should go through the process of how that directly happens. Oh, absolutely. This is actually extremely fascinating. Okay. So just to recap again, I am not a biologist myself. So again, I know what we just said that essentially DNA is four different starting, I guess we'll call it symbols.
00:34:09
Speaker
Yeah, it's the chemical on the DNA, adenine, I think, cyanine, guanine, and thiamine. I might be wrong. And then of course we just call those A, C, G, and T. And then you said earlier, you said that essentially that each letter is, as you said, two bits of information. Can we elaborate why each of those are two bits?
00:34:26
Speaker
Yes, maximum two bits. And the reason why it's maximum two bits is because if I think AC, G or T, you can ask me, is it AC or is it G or T? And I say it's A or C. And then you say, is it A or C? And I tell you A. So it takes two yes or no questions to determine each letter.
00:34:45
Speaker
Well, that's right, that's right. Because A pairs with C and G pairs with T. Well, even if it didn't pair, the fact that there are four of them means that it's two bits. You know what? I think from a previous conversation, this is going to help explain it very well. Okay, so you said that there's four letters. If we only had binary digits, ones and zeros, we could do a maximum of four. With a one and a zero, you could do zero, zero.
00:35:15
Speaker
you could do 01, you could do 10, and you can do 11, and that's it. So because of that, you know, that's a maximum of two bits per letter. And in the scheme you could assign A to 00, C to 01, G to 10, and T to 11 if you were so inclined.
00:35:33
Speaker
Cool. Okay. This is great. This makes sense. I think I understand it quite well now. All right. So then, as you said earlier, DNA and biology, that's the language for life. That's everything we see, all of our traits, and it's quite mind-blowing. Everything from the color of our eyes to traits that we have, the kind of body type we have.
00:35:54
Speaker
Yes. And what we don't need to use anymore. For example, whales don't need their hip bones for the same things that they used to use it for. So there's information arguably lost there. Wow. Okay. I'm still wrapping my mind around this whole fact that those four letters, you know, those combine into the traits that we see now. Oh, just sorry. It's just mind blowing.
00:36:15
Speaker
Now, one interesting thing is that amino acids are represented by one of 25 characters, typically. AC, DEF, GH, IK, LM, and PQR, STV, WY, and dollar sign.
00:36:30
Speaker
That's a mouthful. What's interesting about this is that the amount of pairs in a DNA sequence is not necessarily the amount of information DNA. In fact, it's almost always going to be a little bit less than that because you have places, things like what used to be called junk DNA and virus markings in DNA.
00:36:54
Speaker
Now, correct me if I'm wrong, again, I'm not up to date here. So, is what was once called junk DNA? Is it still junk DNA or is there? There are certain purposes for that. I can't recall what they are. Okay, very good. So, when we talk about, for example, in the Buchnerre chromosome, there are approximately 760 million base pairs, which translates to 3.53 billion bits of representation, roughly.
00:37:23
Speaker
but the amount of information contained is no more than 3.11 billion bits. That's a discrepancy of about 0.42 billion bits. Oh wow, that's a mouthful. So how do we break that down for listeners?
00:37:37
Speaker
Well, it's just an example of how there's less information contained than in the representation, which is always the case, basically. Just to clarify as well, again, we already established for those who are less knowledgeable about biology, such as myself, we already established the four letters in DNA. You also said base pair. Can you clarify what a base pair is? Yes, ACG or T.
00:38:00
Speaker
Oh, just a single one of them. Yes. Oh, okay. It's called a pair because the letters pair with each other. Cool. Cool. Very good. Very good.
00:38:08
Speaker
Now, one way that humans create information when they breed animals is by deliberately using inbreeding. The reason why this works is because when you have two sets of DNA that are very similar, anything created is going to be a mutation. That's why it works. You take these horribly inbreed horses and you breed them with healthier horses to get the traits that you want.
00:38:33
Speaker
Yeah. Wow. So, okay. So then just to clarify, all I know is that, you know, of course inbreeding is very much discouraged and I've heard in a very general term why that is. So let's just say, for example, if you have two first generation inbred horses, I wonder what kind of possible mutations you can expect or even a second generation inbred horse. Well, I don't know too much about horses, but in humans, one generation of inbreeding is likely to produce heterochromia, which is eyes of two different colors.
00:39:03
Speaker
Oh, my gosh. Oh, wow. Okay. And when we say one generation of inbreeding, that is specifically a brother and a sister who would then reproduce and have a child. Brother and sister are multiple first cousin marriages. Okay. Okay. I wasn't sure. Now, real quick, sorry, are multiple first cousins less dangerous than brother and sister? Yes. Brother and sister, the inbreeding coefficient is 0.5. Oh, gosh. Okay. With first cousins, it's something like an eighth
00:39:27
Speaker
Okay. All right. So then this inbreeding is a very intentional thing with respect to horses. Yes. It's designed to create new information, new traits that weren't there before. Okay. So we create new information with DNA through mutations. Okay. Yes. And yeah, this increases the chance of a mutation happening, especially because all changes are more likely to be mutations than not after a certain amount of generation has been breeding.
00:39:52
Speaker
Wow. Interesting. This is fascinating about the creation of information through breathing. So then obviously this is done for a purpose. So the hope is that we understand scientifically that you're going to have a lot of mutations that are not beneficial, but you will have mutations that are beneficial.
00:40:12
Speaker
Yes. An example of a non-beneficial mutation in most species is albinism. If a creature is albino, then it's more likely to be seen in its environment and it's more likely to be eaten. An example of a beneficial mutation would be, for example, when the mole lost its eyes, you have less chance of infection. Okay. Interesting. Do we know with respect to horsing breeding, may I ask what's an example of a beneficial mutation?
00:40:40
Speaker
Oh, it could be anything from coat to trainability to any number of traits that horses are bred for.
00:40:49
Speaker
Okay, wow, this is just fascinating. I know that with respect to dogs historically, I believe that Sharp Hayes were specifically bred so that when they're bitten, it didn't slow them down or inhibit them because basically they've got rolls and rolls and rolls of skin. So even though they are absolutely cute, the intention behind that was
00:41:11
Speaker
And dogs are a specific example of how DNA is very programmable, actually. Dogs and fruit flies and a couple of other animals, just a handful of animals, fall into a category of animals with slippery genomes. Okay. What do you mean by slippery genomes?
00:41:28
Speaker
For example, there's this one set of base pairs in a dog's DNA that if it's only repeated a few times, the dog has an upturned snout. If it's repeated more times, it has a downturned snout. It's very programmable. Dogs are extremely programmable.
00:41:44
Speaker
Man, in this information age, what we find out about DNA and exactly what bit is responsible for what traits is just fascinating. Especially when we unlearn it all the time when we realize that one gene is really not responsible for all incidents. It's a feedback system and it's highly nonlinear, but that's outside the scope of information. That's a fascinating thing, though, for correctness. We think we know one thing, and then we find out that the system is more complex than we originally thought.
00:42:15
Speaker
Oh, yeah. And DNA is orders of magnitude more complicated than even general relativity. Goodness. Yeah. Wow. Wow. Information normally comes from the sun when you're talking about DNA or the creation of information requires energy. And for example, when you had the Cambrian explosion, which was when a few creatures turned into a lot of different types of creatures by mutating all over the place, you had the creation of a huge amount of new information.
00:42:44
Speaker
Wow. Okay. Now this is something that's quite interesting. So if I'm not mistaken, when we're talking about information as we are, we can say it is interchangeable with energy. Is that correct? In a certain sense. Okay. All right. Yeah. This is just new in talking about information available or the creation of information.
00:43:07
Speaker
You said earlier that energy obviously comes from the sun, and you also said that that's responsible for the Cambrian explosion.

Literary Illustrations: Borges' Library of Babel

00:43:17
Speaker
Along with just the natural process of evolution, yes. And then of course the increase in information we are attributing to mutations, and then we get our present biodiversity. Exactly.
00:43:31
Speaker
In 1941, Jorge Luis Borges imagined, in La Biblioteca de Babel, or The Library of Babel, a library with all possible books. This story has direct relevance to information theory. Let's see if you could catch it. A short excerpt will be read in the original Spanish and in English.
00:43:51
Speaker
A cada uno de los muros de cada hexagonal corresponden cinco anacules. Cada anacule en sierra trenta y dos libros de formato uni forme. Cada libros de cuatro sentos de espaquinas. Cada pajina de quaranta ringlones. Cada ringlone de unas ochenta letras de culo de negro. Tambiena e letras en dorso de cada libro. Esas letras non decan o prefigurán lo quidirán las pajinas.
00:44:19
Speaker
Now in English, each wall of each hexagon is furnished with five bookshelves. Each bookshelf holds 32 books, identical in format. Each book contains 410 pages. Each page, 40 lines. Each line, approximately 80 black letters. There are also letters on the front cover of each book. Those letters neither indicate nor prefigure what the pages inside will say.
00:44:48
Speaker
Now, if you didn't catch it, the significance of that was that there are all these possible books and the number of possible books, the number of books in this library relates directly to the amount of information contained in each book.
00:45:04
Speaker
This is an absolutely fascinating read. I've seen this short story brought up in philosophical circles and in English classes. I think everyone, if they have a chance to, should read it, that and everything else, right? I just love it, the whole idea of an entire universe with books about everything. And as far as they know, it's an infinite universe, is that right?
00:45:25
Speaker
Well, as far as they know, they think it might be infinite. If they were infinite, then they would have duplicates of the books because there's a finite amount of possible books. That's right. That's right. All the citizens of this world that is a library, they all live in these little hexagons. They're allowed to move around and they discover new information. And it's interesting because they talk about one of the great discoveries is that they have, what, 25 letters?
00:45:49
Speaker
Yes, if you do the math, that's about 10 to the 1.83 million different books possible. Wow. Wow. Okay. Throughout this short story, we're introduced to a few ideas. Apparently, a lot of the books in this library seem to be quite randomly put together without meaning or purpose, and we can see the immediate philosophical discussions for our world.
00:46:15
Speaker
And also, I mean, there's going to be a book where it's absolutely perfect, everything that's in Hamlet, except at the very end, there's just a random queue. Oh, wow. Wow. This is interesting. That reminds me of that old analogy about if you have a room full of monkeys, eventually they'll write the completed works of Shakespeare. So this is a great discussion on randomness in mathematics. And of course, if you wanted to find a book in this library, you'd already have to know what's in the book.
00:46:43
Speaker
because the thing is there's no information about anything in the books because it's all possible books. There's no constraint of language. There's no constraint of content. That's why this library is almost useless. Wow. Not to get cynical about it. Oh, no.
00:47:02
Speaker
Yeah, but wow, that's fascinating. I think it's fascinating that we bring up this short story in our podcast because of course this entire story is based on information and our episode is based on information and the attempts at describing our universe in terms of information.
00:47:18
Speaker
Yes, and what's interesting is that somebody might ask, well, can't you compress the books anyway? And it's like, yes, you can. However, the average size of the books will be the total amount of information in each book, which is, for those interested, around six million bits.
00:47:39
Speaker
Wow. Now, this book itself, it has quite a few themes. It talks about, of course, information meaning. It talks about randomness. It talks about mathematical concepts and infinity, I believe. And I don't want to spoil the ending of that short story. Our listeners may want to read it, but he has a very interesting idea about infinity and set theory that he talks about in the very, very last paragraph.

Conclusion on Information Theory's Impact

00:48:02
Speaker
And for those who are interested, it's written by Jorge Luis Borges, that's J-O-R-G-E, B-O-R-G-E-S, and it's called La Biblioteca de Baben, or the Library of Babel.
00:48:16
Speaker
Information not only rules our lives, but as we've seen, rules life itself. And as we learn more about this nascent science, we will learn more about ourselves. We explored the realms of fiction, of life science, of computers, and of the spoken word. We hope that the pervasiveness of the science is self-evident. I'm Jonathan. And I'm Gabriel. And unfortunately, Gideon and Hannah had to leave, but we thank them for their contributions. And this has been Breaking Math.