Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
AI vs Human writing and what it means for our thinking image

AI vs Human writing and what it means for our thinking

S4 E27 · Bare Knuckles and Brass Tacks
Avatar
142 Plays16 days ago

What happens when AI-generated text masquerades as human research?

Kimberly Becker, PhD, a corpus linguist joins the show this week to talk about her study comparing human-written versus AI-generated abstracts in high-stakes healthcare research.

The findings reveal something unsettling about how LLMs may potentially reshape scientific communication. ChatGPT's outputs showed higher informational density, formulaic patterns, and a lack of hedging, the linguistic uncertainty that marks careful scientific thinking. The AI doesn't say "may suggest" or "could indicate." It asserts. Confidently. Even when it's wrong.

This matters beyond academia. When we optimize for speed and polish over depth and precision, we're changing how we write, and therefore changing how we think. We're externalizing cognition to systems trained on Reddit threads and blog posts, then wondering why the output feels sterile and an inch-deep.

Becker's work raises uncomfortable questions:

  • Are we training ourselves to accept confident wrongness?
  • What happens when a generation of researchers doesn’t communicate uncertainty?
  • And fundamentally, can a predictive text model ever replicate the pause, the breath, the examination that Neil Postman argued was essential to meaningful thought?

This episode is about whether we're paying attention to what we're losing while we chase efficiency.

Mentioned:

•  • Linguistics Relevance Theory

Recommended
Transcript

Social Media Algorithms and Certainty Amplification

00:00:00
Speaker
I think our our brain started to make that shift because we are so siloed. I don't know, you know, LinkedIn, my LinkedIn is a very specific audience because i I hear from people whose views align with mine because that's what the algorithm gives me and suggests for me.
00:00:17
Speaker
So it started with this kind of siloing that happens in our social media threads. We only hear what we agree with, essentially. We never get any pushback. We don't get any friction about our beliefs.
00:00:33
Speaker
um And so the idea of certainty amplification, which occurs when something tentative gradually becomes presented as like an established fact, is, and and then we start hearing this boosted language that um And that seems normal and confidently wrong is the best way that I can, you know, when. yeah I feel like the entire Internet is the Dunning-Kruger effect, right?

Introduction and Guest Overview

00:01:08
Speaker
This is Bare Knuckles and Brass Tacks, the tech podcast about humans. I'm George K. I'm George And today's guest is Kimberly Becker, a corpus linguist who we have on the show to parse through the differences in her expert level analysis between AI-generated text and human-generated writing.
00:01:29
Speaker
I saw a post on LinkedIn. i thought it was amazing. needed to dig in. And this conversation went in all of the directions that I wanted to, which is, you know, what is the impact of this level of writing? What does it do to our thinking?
00:01:42
Speaker
it was ah very intellectually rewarding. Yeah, I found honestly, it was intellectually rewarding. um You know, as someone who speaks multiple languages and then worked as ah as a cultural interpreter for a while,
00:01:56
Speaker
um you know I think she really hits the nail on the head as to the actual technical foundations of how language is put together and how it is it evolving in, we'll say, ah a negative manner as a result of widespread LLM usage.
00:02:10
Speaker
ah But I think the biggest thing with Kimberly is I just i found her to be a genuinely lovely human being. And she was just an enjoyable- And thoughtful. Yes. Super enjoyable to talk to. I just kind of want to talk to her more. Let's turn it over Kimberly Becker.

AI vs Human Language in Research

00:02:27
Speaker
Kimberly Becker, welcome to the show. Thanks for having me. Yes. So we are excited to have you on our first linguist as a guest. And I'm kind of excited to nerd out about that because it is a passion of mine. But why don't we start in the most obvious place, which is the beginning and which is why you're here.
00:02:48
Speaker
You had summarized a quick study that you did of language and it caught my attention because it was the difference between human generated versus AI generated language. So you just give us the the high level summary and then we'll dig in from there and jump off into some larger questions.
00:03:08
Speaker
Sure. um Well, first, let me say, i i so my background is in applied linguistics, which is a little bit different from theoretical linguistics in the sense that I'm mostly interested in describing what language does in the real world.
00:03:26
Speaker
um And not prescribing what language should do. hu um And my training in for my master's and my PhD was in corpus linguistics, um which is you know, big databases of language that we study to look for patterns.
00:03:45
Speaker
um And so actually the story behind that research is that I had, I don't, I don't know if you're familiar. There's a platform, it's a social media platform for researchers and academics that is um called ResearchGate. It's where we can share our research a little more openly than, you know, especially if it's kind of behind a paywall, people can privately message us and we can share like preprints or whatever more easily. And so what I, what I found was people were citing my research incorrectly after generative AI became so embedded into everything.

Research Methods in AI Language Studies

00:04:23
Speaker
People were, so and and it caught my eye because I was like, we we did, that's not even what we did at all in that study. um How did that happen? And it hadn't happened before that.
00:04:36
Speaker
Um, now part of that was that I wasn't publishing as much. I had, you know, I just, as you, as you grow as a researcher and you publish more. And so some of it was just the volume that I was being cited was more.
00:04:50
Speaker
And so I started digging into that and I was like, wow, these language models are changing the quality of how they present evidence. They are summarizing it in a way that's not quite precise or accurate linguistically. And that matters because there's a hierarchy of evidence in terms of, you know, is it causal or is it correlational or, you know, what's the context of the study? What's the population?
00:05:18
Speaker
So this particular study examined um a healthcare setting because I thought, what's one of the highest stakes context that we could look at? And so we took high impact nursing articles And we we took their abstracts out of the the corpus and we had just the articles. well Well, the articles weren't in the corpus. The data set we used was just the text of the articles without the abstracts.
00:05:49
Speaker
um And we put those into ChatGPT for the Omni version because this was last year. And we asked it for an

Key Findings in AI Text Analysis

00:05:59
Speaker
abstract. It was a one-shot prompt.
00:06:01
Speaker
Okay. um And we told it to use the Prisma and the Consort frameworks. So Prisma and Consorts are for um randomized control trials and systematic reviews, in case you're not familiar. And they're they're pretty stringent kind of structural requirements. Okay.
00:06:22
Speaker
And um so, yeah, it was a one-shot prompt. Yes, we know that if we prompted it again, it would be different. We wanted to replicate maybe what a m novice researcher would do or someone who's not familiar with prompting might do.
00:06:38
Speaker
And so we put these these articles in and we asked for the abstract to be generated. And then we took those generated abstracts and we compared them to the actual human-written abstracts to see what were the differences in the linguistic features.

Implications of AI Language in Misinterpretation

00:06:52
Speaker
And then what were some of the key differences, like the ones that you would, you know, tell somebody on the street? Yeah. So that, you know, that's a little bit tricky to say it in a way that, let me just pull up the actual find a list of findings here.
00:07:12
Speaker
just just's for Just for kind of confirmation though, like, these are These researchers were just using on prompt though. There was no actual manual evaluation of the entire text, which then produced a comparative analysis. It's just purely prompt based analytics, right?
00:07:31
Speaker
You mean my research team? no, no. Well, the original, um we'll say, I want to call it false positive, the original inaccurate results that triggered that research.
00:07:44
Speaker
Was that based on no one actually read the full articles? Like they were just working on abstract? don't know. I mean, yes. so that sounds like I think that's part of it. And I think part of it is because they, that's all they had access to, you know, abstracts are what people read.
00:08:05
Speaker
um for the most part, when they're just starting to cherry pick which research they want to use, like in their literature review. um I can't know what those researches were doing with, you know, maybe they weren't reading carefully. Maybe they were non-native speakers in another language, you know, trying to interpret my English text and and not quite being able to make sense of

Impact of AI on Institutional Language

00:08:29
Speaker
it. But, you But yeah, that was my theory, i guess, my my intuition.
00:08:35
Speaker
ye um so so So we have like a set of features that were very common in the chat GPT texts. um And then we have a set of features which were more common in the human text. And that's kind of how we did it. It's called key feature analysis. Mm-hmm.
00:08:55
Speaker
um And so basically for the chat to be T-text, we found that they reflected kind of three broad stylistic categories. One was informational density. So it dent density meaning like the language was very compact. um It was, ah you know, lots of noun phrases and then attributive adjectives kind of describing those noun phrases um as opposed to verbal structures where you have like, you know, a a noun and then a verb and then a description of that noun in that kind of causal structure.
00:09:31
Speaker
We also saw a lot of enumerated and parenthetical packaging of details, um which could have been because of the PRISMA and the CONSORT frameworks. We're not really sure. We're still digging into this data.
00:09:43
Speaker
ah Formulaic language. So a lot of the same words repeated. And it was words like although and though. um you know, it would it was just a real like predictive, which I mean, that's what people see when they say,
00:09:59
Speaker
when they think that they're detecting AI generated texts, they're like, oh, well, this word keeps being repeated. um But it's not just lexical or vocabulary. It's also syntactic patterns of like, I mean, ah a good one to give as an example that you probably recognize is, um,
00:10:18
Speaker
Not X, but Y. Yes, it's a very

Historical Errors and AI's Role in Scaling

00:10:21
Speaker
common phrasing in all LLMs. Yes. It's not this comma, but that. Exactly. Yes. Or the question isn't X, it's Y. That syntactic pattern is really common. I did have an early editor ah beat that out of me in my writing career.
00:10:41
Speaker
Oh, really? Yeah. um But I think the thing that really stood out to me was you had also found that the ChatGBT abstracts used, I think, what you called like a lower degree of humility or hedging language.
00:10:56
Speaker
Yeah. So in my field, we call this stance and engagement. and it And it has to do with the way a a writer positions themselves in terms of the text, like how certain they are, how doubtful they are um about it. And it's something that I've researched a lot in my career. I'm really interested in how especially science writers use hedging and stance because we often have this idea that science or research writing is objective or neutral. Right.
00:11:32
Speaker
And we often hear that prescribed to us. And actually, if you go back and you read some of those comments on that LinkedIn post, you'll see that there's there's a ah person who who is very staunchly prescribing that this just can't be right.
00:11:48
Speaker
um that that science writing is full of certainty or doubt, hedging and boosting and engagement or not. And this is just not, he he he just refused to believe that this was true. And a lot of what corpus linguists know about our research is that we can't intuitively know, even as native speakers, we can't intuitively know what is true about language. It's always an empirical question that we dig into the data for. Mm-hmm.
00:12:16
Speaker
Oh, I mean, there are plenty of people on LinkedIn who will refuse to believe whatever. oh sure. Yes. so But the hedging patterns were notably absent in the chat GPT texts.
00:12:28
Speaker
um And so that was something that I picked up on really quickly as concerning. Yeah. So...

AI's Effect on Critical Thinking and Skills

00:12:38
Speaker
Thank you for that.
00:12:40
Speaker
I want to use that to jump off into a couple of other questions. But the first is you looked at these nursing journal articles and it occurs to me that across many industries, people are using these large language models to draft reports, strategies, whatever, um in the field that George and I both came from, cybersecurity.
00:13:05
Speaker
They're being wrapped in as features to do incident reporting. God forbid, um threat intelligence, just basically kind of commoditizing the writing process.
00:13:18
Speaker
And so I want to, get your two cents and like what happens when that kind of certainty language starts creeping across institutional language, right? Because if you think about reports and strategies and people build out business operations, they build on top of these ideas. And if there is a lack of ah uncertainty, I'm just curious as to what you think that might hold for us.
00:13:53
Speaker
Well, if you back up even further from my realization that people were citing my research wrong, um when I really started digging into the language of technology, AI generated text, um I started looking at citation chaining and what it means when you play telephone sort of with science, like, well, so-and-so said, and then, you know, I tell you and, or I write it and you cite me, but I didn't quite get it right. And so you don't quite get it right. And then someone cites you. And so I started looking. And then it compounds over time.
00:14:32
Speaker
Exactly. i started looking into that and um I came across the Porter and Jick letter. So I don't know if you're familiar with this. this i am not. Okay. So the Porter and Jick letter was the original, and it was actually a letter, a five sentence letter from doctors um

Social Media's Context Collapse and Simplification

00:14:51
Speaker
who cited one of the first studies about opioids that that said um opioids were not addictive in institutional settings. It was not a research study.
00:15:04
Speaker
It was just a letter to a journal. And that is what pharmaceutical companies picked up and cited over and over and over again. oh my God. To convince these doctors that they needed to prescribe opioids. It wasn't even a study. It was a decontextualized five-sentence letter. you can You can look it up. And there's a great article from 2018 when some medical researchers realized this.
00:15:33
Speaker
That, ah sorry, I'm just going to pause. That is just absolutely horrendous to think about. Sorry. Okay. Yes. We'll come back to that. We'll come back to something about that.
00:15:44
Speaker
Sure. Yeah. And if you think about, I mean, just take that and and expand it out to what's happening now in the CDC with vaccine, you know, with all of this science rhetoric that's in the news, it's very concerning that we don't have a sense of the hierarchy of evidence whatsoever. We don't know the difference between what's found in a randomized control trial versus a a cohort study. You know, these are things that scientists know that the normal
00:16:18
Speaker
the normal media consumer doesn't know. And so that was where my concern began. And then all of a sudden, as a person who's not studied medical writing at all, I'm like, oh, wow, this is where We take that instance of someone playing telephone with science and we scale it with AI.
00:16:39
Speaker
Because people always ask me, can a human make that same problem? Sure, a human can make that same problem, but AI can make it at scale. And that's what terrified me. Yeah, machine speed and scale is really the bigger problem. And you can point to the same thing through social media, right? This has been called context collapse, where something just kind of lands in front of your attention span and you see a claim or you see some hyperbolic statement and you are encouraged through design decisions and UX choices to react to that or to comment on it.
00:17:12
Speaker
But you have no further context. And I think we saw... That problem accelerated through social media and now at sort of the human language level.
00:17:24
Speaker
Yes, your your point is well taken. But i I did not know that about that letter and that is um just as somebody who knows many people who have suffered through the opioid epidemic that is heartbreaking, frankly.

AI Language and Cognitive Changes

00:17:43
Speaker
When we come back, we get into the impact that this kind of writing may have on our thinking and our ability to think critically and ask the hard questions.
00:18:04
Speaker
When we read AI text that's missing humility and hedging language, do you think we're processing it differently at a cognitive level? Are we being primed to accept more absolute statements because of linguistic cues for uncertainty or just absent? Because I know, you know, when I deal with a lot of the prompting type work that I do, i um I'm really big on human in the loop. Like I take an answer that ah that a prompt produces for me and I always make sure that first of all, I rewrite it in my own words. Like I treat it as an initial draft, not as like final.
00:18:35
Speaker
um I think in a professional world, it is absolutely unethical to just produce AI slop and use that as like your final product that you submit. um But I do notice that, you know, the certainty of the model, irregardless of whether or not the statement is correct,
00:18:53
Speaker
that is That is an accuracy that is missed. And i fear that a lot of people have become so reliant on these model outputs that the critical thinking that we typically have, actually read and write our own material, or that we read books, entire books, not just a prompt that cites a random page.
00:19:16
Speaker
um I feel like we are slowly decaying that. And that is a serious concern about over usage of prompt based models to produce either academic or professional or even I hate to say it self therapeutic information.
00:19:34
Speaker
Yeah, I

Algorithmic Silos and Perception Shifts

00:19:35
Speaker
think it's really, I think our our brain started to make that shift because we are so siloed because of algorithmic. I mean, I don't know, you know, LinkedIn, my LinkedIn is a very specific audience because i I hear from people whose views align with mine because that's what the algorithm gives me and suggests for me.
00:19:56
Speaker
So it started with this kind of siloing that happens in our social media threads. We only hear... what we agree with, essentially. We never get any pushback. We don't get any friction about our beliefs.
00:20:12
Speaker
um And so the idea of like certainty amplification, which occurs when something tentative gradually becomes presented as like an established fact is, and and then we start hearing this boosted language,
00:20:29
Speaker
um And that seems normal and confidently wrong is the best way that I can, you know, when. yeah I feel like the entire Internet is the Dunning-Kruger effect, right? Exactly. It reminds me honestly of just like, you know, I worked in intelligence in the intelligence world a long time ago in a uniformed life, and even in my early cyber career doing threat intelligence.
00:20:53
Speaker
And I found that there was an effect where if someone had a theory, and whether it was human competitiveness as an analyst, where they wanted their their thesis to be the correct thing that's gone with, or whether it's just a pure, like, I'm more right than you,
00:21:09
Speaker
There are people I know that would die on the hill of whatever their assessment was, even if it's it's it's plainly wrong. Or if if you dig just a little bit beneath the surface, you're like, ah, you're you're missing a couple of factors. And X, Y, Z, this is what it really probably is. And of course, in intelligence world, they teach you probabilistic language. So it's very, very rare that you will say something is certain. Like you'll you'll assign like a a descriptive adjective to it, like something is highly probable or it is highly likely that.
00:21:38
Speaker
You will never say for certain unless like you're literally looking at like this is a tank. We know it's a tank. That's a turret. But if you're looking at like a satellite image down below, like it's highly likely those are armored vehicles, bla blah, bla blah, blah.
00:21:53
Speaker
I find that that nature and humanity almost finds its own confirmation bias in that confidently incorrect nature those prompts have because the prompt said it is.
00:22:06
Speaker
Do you think that, you know, we've had a generation now of people who really they're they're falling into this cognitive trap. That's only i could call It's a cognitive trap. It's like we've, we've lost the ability to realize, Hey, is this actually right?
00:22:21
Speaker
Cause I, I, I think that's where humanity is starting to lose. And, and I, I fear that this is the direction we're going in I hope

AI Training Data Quality and Influence

00:22:31
Speaker
you, Dr. Becker, can give us some hope that this is not actually what you're seeing at scale.
00:22:39
Speaker
Well, you know, I just, i I was doing a little bit of listening to your podcast ah and I was really curious about your guest, Mike McLaughlin, and what he was saying about like the quality of the data that we're, you know, using. i mean, these, these large, huge kind of, you know, Claude, Gemini, ChatGPT,
00:23:02
Speaker
the data that they are trained on is just absolute slop from the internet. um So it's not surprising that it is going to be kind of messy and come out not nuanced and, and you know, very.
00:23:19
Speaker
And i and i I do think the there's hope in that people will, for a variety of reasons, move away from using these huge scaled language models and hopefully they will see the value in smaller models and um And really, really vetted data because, you know, corpus linguists have been doing this for decades, really looking at the representativeness of the data and have found over and over again, bigger is not better.
00:23:49
Speaker
It's never better. I love that you bring that up because we know from the settled lawsuit, you know, for example, that Anthropic and many others trained off of that infamous data set. I think it was called like Books 3 or something like that. It was just copyrighted material, right? Which is high quality writing that went through an editing process, professional writers, etc.
00:24:14
Speaker
Whatever, however many... billion words, so you know, is in that data set. It's a drop in the bucket compared to like Reddit and just kind of like casual colloquial nonsense. And then to our previous point about we just came through this 10 years of social media where the language got progressively more hyperbolic.
00:24:39
Speaker
If you think that I am responding to that on social media, say 2014, 2015, and I start to write like that throughout the internet, whether I'm commenting on forums or in Reddit, like,
00:24:52
Speaker
It's kind of a feedback loop, right? That the the bad thinking and the bad writing just kind of reinforces itself. So

Decline in Attention and Critical Thinking due to AI

00:25:01
Speaker
this actually takes us to an area that i want to explore.
00:25:06
Speaker
Definitely more abstract, definitely more philosophical, but
00:25:11
Speaker
um Two recent pieces that really have been banging around on my brain are James Marriott, a writer, had this viral essay about the what he calls the post-literate society.
00:25:23
Speaker
Derek Thompson of The Atlantic had The Decline of Thinking. And both are pointing to this idea of declining attention spans, declining literal word count per sentences, right? There's an academic study cited that...
00:25:38
Speaker
You know, undergrads can't get through the first page of Charles Dickens' Bleak House. And so i guess I want to take this thread of if certainty language affects our ability to think critically, if we all start leaning on kind of poorer quality writing And you just said bigger is not necessarily better. I guess I want to get your take on what is the impact of this writing, do you think, on just our own thinking, like our ability to articulate and understand our thought process?
00:26:17
Speaker
Yeah, I mean, so I'm not a cognitive scientist. um And I can't ah can't speak to how the brain processes it but I...
00:26:30
Speaker
I do worry that when we get our output through an intermediary, machine that is an intermediary, where it's sort of like getting, like, we all have that one friend who like watches TikTok all the time and then diagnose, I don't know, maybe you don't all know this friend. No, that's a thousand percent. Yes. Whatever the latest trend is.
00:26:57
Speaker
Yeah. Or like whatever is the latest, you know, psychological um diagnoses. I mean, you know, they're sending me like, oh, I think you're on the spectrum. Oh, I think you have this oppositional defiant behavior syndrome or whatever because of a TikTok video. um it We are it's, you know, AI is yet another intermediary force that is shaping, you know, rather than getting it straight from the horse's mouth.
00:27:27
Speaker
we're getting it from a machine that has decontextualized and shifted the granularity of the the information. And because it does require us to think a little bit harder and our brain naturally, well, there there is a linguistic kind of philosophical linguistic theory called relevance theory.

AI as an Intermediary in Thought Processing

00:27:49
Speaker
When in the basic idea of relevance theory is it sort of, we say the minimal thing we need to say to get the point across is,
00:27:56
Speaker
ah um And so if the brain is constantly searching for that minimal amount of um of something to cling on then, that you know, that's just how we're wired. And if it's always coming through a machine that has just sifted away all the nuance, then, yeah, it's very concerning. And the fact that, you know, companies build they they They specifically, their UX takes out the friction that you would require.
00:28:32
Speaker
um i don't know if you know what I did before this, but I had a, it's a long story, but I had a company that had an AI product that was helping, it was a feedback tool for researchers.
00:28:45
Speaker
And we specifically built in friction. Yeah. And our users really hated it. They just wanted to get to the model. They just wanted to get to the chat lot. But we would ask them all these contextual questions and because they were researchers. and People don't like friction.
00:29:02
Speaker
They don't want to be slowed down. They don't want to pause. They don't want to think through context or the level of granularity. They don't want to consider how certain or doubtful they are. I mean, from a behavioral perspective, they have, just like you and I, have been trained over the last 12 years to just, I push a button and get the thing. I hate it, though. I hate it so much, man. this is not like my I was blessed to have a father who was a very analytical, critical thinker, and his whole obsession was like, you're going to read, you and your sister's are going to read, and we're going to talk about it We're going to sit around and talk about it.
00:29:37
Speaker
And like, it's given me an advantage. I mean, I hate it kind of as a kid. I just want to go play football and dink around be with my friends. But like, no, we're going to do this Socratic exercise as children. So awesome. But like.
00:29:49
Speaker
where You had good intentions. It did, but it it paid off. It paid off my adult life, my career. But now. I have a hard time when people produce all these prompts so quickly, these prompt based answers. And that's great.
00:30:03
Speaker
But like I still manually read everything. Right. And sometimes when they produce these things for me, I'm like, did did do you actually read the thing that you produced? Like, can you recite this back to me?
00:30:16
Speaker
And it makes me feel, doesn't make me feel bad. I don't feel bad about it, but it it almost offends them. It's just like, yeah, but I get that you put all this effort into prompting a thing, but like, did you actually read the output?
00:30:29
Speaker
Can we act? Is it realistic? Is this something that can be produced? Is this something that can be taken and actioned Because I fear that our rush to be quick and efficient is going to lead us into making catastrophic technical, scientific, and business-based decisions.
00:30:48
Speaker
Like, that's ultimately where a fear is going to happen. It's going to drive people above the cliff or but over the cliff, excuse me. It just also, like, devalues the, like, when you take the time to write something and you can defend it because you've internalized the arguments or whatever, that's one thing. But if it's just like, George, go produce words.
00:31:10
Speaker
I'll push the button and I'll give you the words. Like it kind of like there's no value to it, right? It just, and then my fear is that it just, I mean, we already saw this with quote unquote blog content. Originally blogs were,
00:31:25
Speaker
thoughtful pieces and it was a way to just basically write essays online and then somebody was like oh we should use this for Google search engine optimization. i should just pay people pennies on the dollar to just write words that matter for search right and it just like just like cratered the quality of the internet basically. yeah Anyway, thats i mean that's a tangent.
00:31:47
Speaker
Yeah. Yeah. it's but and This is, you know, outside the field of linguistics. This is rhetoric. This is like way, way back thousands of years ago. Author, audience, purpose kind of rhetorical situation, you know, and and there is no author.
00:32:03
Speaker
involved in when you, when you prompt a ah model to extrude some text. Yeah. um There's no real author. There's, there's a prompter, but that's not the writer of the text. And I, and I, on a personal note, I felt my own attention span, like,
00:32:25
Speaker
I mean, it's taken an acid bath for sure. But also, I had trouble getting through older books because as the syntax is completely different, you know, like longer sentences, multi-clausal structures.
00:32:39
Speaker
And so the last few books I've done, I mean, people will say, oh, have you read this book? that I've almost... just gone like full 19th century at this point. Cause I was like, I kind of need the time under tension to like retrain my brain to be able to just, it's okay that this sentence is like four lines long and I should be able to like hold

Language, Training Data, and Cognitive Effects

00:33:00
Speaker
these ideas. But I feel like I'm retraining my brain for that. And there's, there's another factor, like I want i want to get into a question on like, like hedging that as a, as a training model data issue.
00:33:11
Speaker
But I think one factor to carry on what George is talking about, I find when I read material in Arabic or French, like my my other two languages, um my brain still kicks back into its original train mode of how to read fully structured material.
00:33:29
Speaker
Because when you're working in those languages, English is my primary language. My first language like still is Arabic. so like I find my brain shifts and because I'm not prompting things in those foreign languages, my brain still operates in the traditional set for those languages.
00:33:47
Speaker
That's something we could touch on. I think it'd be an interesting thing to unpack, but I want to ask for ah is Is the missing hedging just, is it is it as a training data problem that future models are going to fix?
00:34:00
Speaker
Or is there something fundamental about how LLM's process language altogether? And if it's fundamental, what does that mean for AI as a communication tool or as a thinking tool going forward?
00:34:14
Speaker
I think it goes back to the data. I mean, what what are, you know, if it's trained on a lot of confidently wrong styles of writing, god damn then it's going to produce confidently wrong slop. And so I think we have to look at representativeness of the data. It's really just a a matter of if you want to use a tool, like she OpenAI just came out with their new scientific research.
00:34:42
Speaker
It's called PRISM. And, you know, you can use it to fill the gaps in your writing. it will take a hand-drawn ah graph and put it into your into your document for you.
00:34:58
Speaker
um But it's still just trained on Reddit and blogs. And I mean, sure, there's some open source ah research in there and some books and some some great writing. But

Improving AI Models through Quality Data

00:35:08
Speaker
there's not enough of it, to your earlier point, to make a statistical difference in the predictive modeling patterns. So I think it's it all goes back to the data and what's in the back end. And we don't know what's in those those and until until it's small enough that we can vet it and know the representativeness of each genre or um of the writing that's in there, then we're never going to know. And that is, and again, like that's going to take a lot of time.
00:35:42
Speaker
Yeah. My concern, and we'll close out here, is also that even if you got a smaller language model that was trained on, you know, rigorously vetted human text, high quality research language that included the, you know, uncertainty and hedging,
00:36:04
Speaker
I mean, fundamentally at its core, the the act of relying on that is doing something to the researcher, right? So I i want to end with a quote from Neil Postman's seminal work, Amusing Ourselves to Death and Get Your Reaction to it as a linguist.
00:36:21
Speaker
So the passage is quote, writing freezes speech and in so doing gives birth to the grammarian, the logician, the reartician, the historian, the scientist, all those who must hold language before them so that they can see what it means, where it errs and where it is leading, end quote.
00:36:42
Speaker
Yeah. Writing freezes speech. Is that how it starts? Yes. Writing freezes speech and in so doing, you know, and we have to hold it up and examine it. and Yeah, we have to pause.
00:36:55
Speaker
People don't like to pause. People don't like to take a breath. They don't want to, they don't want to slow down. i mean, I don't want to slow down. i'm not pointing fingers. This is a real i I feel sometimes like I'm optimizing myself to death, um trying to do things faster, better, multitasking. i have to really slow down and breathe and take a beat. um But I think we can hold it up to some very common linguistic frameworks and theories that people have worked with for years to better understand what's inside the language that externalizes our thoughts.
00:37:37
Speaker
um I mean, there's some pretty simple frameworks that I think we can, you know, it doesn't take someone trained heavily in linguistic theory to understand, for example, the difference between Accuracy, so polished, precise text that is grammatically exact and complexity in writing, which is sophistication, depth, that kind of thing, or even just the concept of like breadth,
00:38:08
Speaker
So a lot of different topics versus depth, you know, going further into those topics and really unearthing what's

Value of Traditional Rhetoric and Writing

00:38:18
Speaker
there. and And I think what we're seeing is just like all this attention to accuracy, maybe some attention to fluency, connection, coherence, that kind of thing, and zero examination of complexity.
00:38:31
Speaker
um And from a language teaching, a lot of my background is language teaching. From that perspective, that's when you know that someone has really become proficient in a language is when they can tackle the complex sophistication of, you know, in-depth topics and not just extrude this very polished, grammatically perfect speech.
00:38:53
Speaker
what i'm What I'm hearing you say is bring back cursive. A thousand percent. A thousand percent. I mean, New Jersey just did that, I think, or one of those northeastern states. Yeah.
00:39:05
Speaker
Awesome. Well, yeah Kimberly Becker, thank you so much for taking the time and joining us pretty early in in your day and also for taking the time to examine this question and to share your results, your initial findings publicly. a Shout out to listener Bronwyn Hudson, who is the reason that I saw your post, um because again, we're trapped in our algorithmic bubble, but she is she's a linguistics major. And I was like, oh, we we need to we need to have her on the show. So thank you for coming on.

Episode Conclusion and Listener Reflection

00:39:34
Speaker
thanks Thank you. It was a pleasure. All right. I think the question I would leave you with listeners is where is your attention span? How do you think about writing and reading and its impact on your ability to articulate your own ideas? That's what I would leave you with. Yeah, I think for me, I would go with, do you believe that you are a critical thinking individual?
00:39:57
Speaker
And is the rush to produce output decaying your ability to be intellectually honest in what you're actually putting out there? All right. Take that forward and we will see you next week.
00:40:11
Speaker
If you like this conversation, share it with friends and subscribe wherever you get your podcasts for a weekly ballistic payload of snark, insights, and laughs. New episodes of Bare Knuckles and Brass Tacks drop every Monday.
00:40:24
Speaker
If you're already subscribed, thank you for your support and your swagger. Please consider leaving a rating or a review. It helps others find the show. We'll catch you next week, but until then, stay real.