Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
How Does Generative AI Represent the Global South? image

How Does Generative AI Represent the Global South?

S2 E1 · Unpacking Us
Avatar
430 Plays1 year ago

With all the buzz surrounding AI,  we're missing an understanding of how recent AI advancements affect those in the global South. I talk to Rida Qadri about ways in which generative AI fails to represent those in the Global South, what the implications of these failures are, and what's needed to do better.

Rida Qadri is an interdisciplinary scholar focusing on the cultural impacts of generative AI for people and communities in the global south. She is a Research Scientist at Google Research, and has a PhD in Computational Urban Science and Masters in Urban Studies from MIT.

Both Rida and I are speaking in our private capacities, and neither Rida's nor my views expressed in this episode necessarily represent those of our respective employers. 

Transcript

Introduction to Season 2

00:00:04
Speaker
Welcome to the second season of Unpacking Us. I'm your host, Asad Diakat. In this season, I'm going to change things up a bit in a way that reflects my own progression and interests right now.

New Focus: Tech & Global South

00:00:19
Speaker
I'm going to zoom in the thematic focus of this podcast on technology and development.
00:00:26
Speaker
And at the same time, I'm going to zoom out the geographical focus from talking about Pakistan only to talking more broadly about the developing world or the global south or emerging markets. But at the same time, a lot of our guests and a lot of our focus will remain on Pakistan.
00:00:45
Speaker
We'll talk about how technological innovation is fueling growth in these countries, about how it's enabling exchanges and products and transactions that we couldn't dream of a few years ago. And yet at the same time, it's uprooting livelihoods, spreading discord, disenfranchising large segments of the population.
00:01:10
Speaker
We'll talk about financial inclusion, education, AI, politics, and many other areas as well. We'll talk to builders and doers in addition to talking to some thinkers and researchers as we have in the past. And I hope both you and I will learn a lot in the process.

AI in the Global South: Guest Rida Kadri

00:01:31
Speaker
Today, we're going to start off this second season by talking about an area of technology that's probably top of mind for most people right now, which is artificial intelligence. I live in the San Francisco Bay Area, and that's where most of the AI action is emanating from. There's a lot of excitement around AI, and at the same time, a lot of cautionary voices about bias and potential harms.
00:01:59
Speaker
with concerns that range from super intelligent AI obliterating the human race to threats to national security, mostly to do with the US or the Western world in general. What's absent from this discourse for the most part is a focus on how these technologies are perceived by and how they're going to affect the majority of the human race, which resides in the global South.
00:02:25
Speaker
I'm very lucky to have with me today, Rida Kadri, who is an interdisciplinary scholar focusing on the cultural impacts of generative AI for people and communities in the global south. She works as a research scientist at Google and has a PhD in computational urban science and master's in urban studies from MIT. Rida, welcome to the show.
00:02:47
Speaker
Hi, thanks so much for having me. So I want to talk about the implications of the AI revolution, as we're calling it, on people in the global south in this episode. But before we get deep into that, I want to start off by talking about what AI is and what the recent fuss is all about. So at the heart of it, AI today doesn't seem very dissimilar from another hype term from recent years, which was machine learning.
00:03:15
Speaker
So the basic idea behind all of this is prediction. So the idea being that you can build a model that predicts, for instance, what the weather will be tomorrow, that predicts when a disease may occur, that predicts what the next word in a sequence of words may be.

AI Models and Creativity Challenges

00:03:37
Speaker
Machine learning is being used even today far more than many of us realize in how we interact with not just private companies but also with governments and with each other and so this this is this has happened before kind of the recent
00:03:55
Speaker
AI revolution started. On top of that kind of use cases, the recent advances have been in the specific area of large language models and text to image models. And this is an area that I know that you have been spending a lot of time working in. So can you tell us what these models are all about and whether it's accurate to call this an AI revolution and the extent to which they're going to start disrupting the world?
00:04:26
Speaker
Yeah, I think we've basically, you know, have always had this long history of trying to simulate various human intelligence tasks computationally. So there's always
00:04:45
Speaker
The bar of what that is has changed over time, so I would take you back even before this current use of machine learning. We've had various forms of intelligent machines for the last 60, 70 years computationally, and even before that,
00:05:02
Speaker
There have been ideas of imaginations of what intelligent machines might look like what they could do and what the bar of intelligence is has changed from calculations, so just like calculators as intelligent machines do things like inference, prediction, and now to
00:05:24
Speaker
Creativity so in some ways, you know, we've always tried to use machines as our foils to understand what is it that humans can do exclusively versus what are things that roles that machines can take on and so a lot of the fuss almost always
00:05:43
Speaker
happens within this context, I think of like this existential concern of being replaced, being eradicated, being the machine becoming our overlords and taking over.
00:06:01
Speaker
which also means then there is a lot of room for, you know, what hoaxes and scams. So some of my favorite ones in the past have been in like back in the 1700s, people might've heard of the term mechanical Turk. And that was basically this autonomous chess playing machine that we had for like around 80 to 100 years stored
00:06:30
Speaker
toward the world around the time of the Austro-Hungarian Empire. And people were so impressed by it, and it was this machine that played, it was like a humanoid kind of machine that played chess.
00:06:44
Speaker
And eventually it turned out it was basically a cabinet where there was an actual person sitting in there playing chess. So I think because of this desire to see intelligence and humanness outside of us, we've fallen prey to a lot of hype. And obviously now when you think of some of these machines in the past, they seem pretty rudimentary and basic, but when
00:07:09
Speaker
At that time, these were the live language models and text image models. So I think any understanding of or any response to technologies today have to be put in context of history of technological development and how we have always made some sort of incremental progress.
00:07:33
Speaker
So having said that, the current technologies are very impressive. This is the first time that we have now forayed into the world of actual generation and creativity, as opposed to just inference and prediction. And that's why I think
00:07:52
Speaker
This has really captured public imagination, because we've always been told that creativity is the soul realm of creation. It's a soul realm of humans, and it's what makes us human. So what these models basically do, and I'm going to simplify this, and I'm sure a lot of computer scientists will be very upset with me, but what they do is they take a whole bunch of
00:08:21
Speaker
data so you know millions and millions and billions of data tokens and they learn patterns between them so they learn that you know as you said the most likely word after my might be is I don't know the most likely in image space they learn relationships between
00:08:49
Speaker
images and pixels and labels. So they learned that, you know, that's why if you put in like a Pakistani woman in Dali or stable diffusion or mid journey, what you are getting is the amalgamation of the machine's understanding or learning of what are the pixels and what are the images that have gone
00:09:18
Speaker
been associated in the data corpus with the word Pakistani and the word woman. So in some ways, these are, they are, you know, computationally, they're very impressive, though, because it is very complex to not just understand these, have the machine, you know, kind of build neural networks to have the machine understand these patterns, but also to be then able to
00:09:48
Speaker
generate things that are, in their case of images, aesthetically pleasing, beautiful, wondrous, fantastical. In the case of text, coherent, it sounds like they have the basic structure of human conversation and language now. But it is important to remember that so far there is not really any proof that any of these
00:10:14
Speaker
technologies have any actual understanding of what this is. So there's no intentionality behind

Bias and Cultural Representation in AI

00:10:22
Speaker
it. There's no model of what the world looks like, inherent understanding of what biases, what toxicity is, what stereotypes are, what love is, any of the intentionality that actually underpins human communication. So at least part of the hype, then,
00:10:43
Speaker
is basically because we end up inevitably comparing these models to ourselves. And so then part of the reason why we weren't so worried about predictive models in the traditional sense
00:11:00
Speaker
was that even if a weather predictor is really, really good, it doesn't threaten us because humans are not particularly good or known for predicting the weather. On the other hand, if it starts mimicking humans even at a very basic level, then we start immediately feeling threatened.
00:11:19
Speaker
I guess I want to move on now to start thinking about the ways in which, as you started alluding to, the way in which these models represent different populations, which is in large part a core of your work.
00:11:40
Speaker
I again want to caveat that this is not a first. It's not the case that predictive models have not been biased in the past. It's not the case that we have not been worried about bias and AI inequity in the past. So kind of a prime example that worried a lot of people in the US, for instance, was when the judicial system started using
00:12:02
Speaker
predictive models to start predicting whether a given person is likely to commit a crime or not. And inevitably, because the data that was fell into these models showed a disproportionately large number of black people committing crimes, because they were convicted of crime in a biased system, these models then took that data and started predicting that black people were more likely to commit crimes. And that was just a feature of the data that was being fed to the model.
00:12:31
Speaker
So it's not new, but I take it that there are unique ways in which these text-to-image models and these predictive large language models fail to represent certain populations or are likely to cause harm to certain populations. So what are some of those ways in which you're worried about inequity and harm here?
00:12:56
Speaker
Yeah, so as you said, you know, this is not new, even before AI technologies are always going to be embedded in social context and inherit a lot of, inherit a lot of bias, as you said, but also social inequity. And so it's very unlikely that if you are introducing technology in a world that is socially unequal, that has
00:13:23
Speaker
problems of racism and sexism and other forms of inequitable power distribution, then technologies are going to create outcomes also.
00:13:37
Speaker
are or reflect outcomes that are reflective of those biases. So think about these technologies are basically becoming, you know, in some ways, technologies of media production, of knowledge generation, and also of
00:14:00
Speaker
accessing a lot of knowledge, right? So the way they're being on their own, you know, if these were just tools, toys that we were sort of playing with, it might not be that much of a concern, but they're increasingly being integrated into products of a wide variety. So things like search and like information retrieval, things like generating, as I said, knowledge through writing articles, writing stories,
00:14:29
Speaker
media production, so things like script writing, things like artistic production, marketing material. So I think the concern that I have is that often technologies don't really necessarily get adopted because they're very good, but they also get adopted.
00:14:57
Speaker
because people are very excited about them. There's a lot of height. There's a form that no one wants to be left behind. And I think truly, this is the first time that I have seen such a global excitement about generative technologies. When I would go to Pakistan, people in Pakistan were not really talking about the dangers of crime prediction models. Or they weren't talking about, oh, do you know,
00:15:21
Speaker
the Punjab police is going to use predictive models for crime detection. At least the average person at home is not. And this is the first time that I've actually, every time I go back home, people are talking about generative AI. These conversations are now happening, adoption is happening. What are some of the ways in which you're already seeing it being used in Pakistan?
00:15:50
Speaker
Um, so not necessarily at like an, I don't know if it's being used at an enterprise level, right? Because enterprise right now, the tools are very rudimentary and they're just being rolled out. Um, but in terms of just like people use, like people.
00:16:05
Speaker
Using these technologies to like smaller startups using them to like fitness startups using them to generate diet plans. Small scale media houses using them to generate articles. People using them to generate marketing materials.
00:16:24
Speaker
So it's interesting to me that these technologies are being used in these cases because I don't think anyone has really thought about whether these technologies are actually well-suited to these use cases. I think there was, when Chai GPT came out, there was so much awe.
00:16:47
Speaker
of its technological abilities and its outputs that people in some ways felt like it could do anything. So one good example is large language models are not necessarily very good at information retrieval because they're not actually, they're not Google search, they're not Bing search, they're not like
00:17:11
Speaker
what they're architecturally designed to do is not necessarily like give you the truthful or the most relevant response, but because the
00:17:22
Speaker
there was no such recognition. People were using it to search things, like what I'm talking about with the diet plan, right? People were using it the same way that you would use search engines to be like, what is the best diet plan for me to lose 10 kgs of weight? So people were not, I think, able to recognize, and companies were not telling users what their limitations were, what the difference between
00:17:48
Speaker
Text association and retrieval is and which is why the question of hallucination was not something that people were thinking about, right? Like if you are using chat GPT or like any large language model to create a core syllabus, for example. At least at the

Generative AI in Pakistan

00:18:04
Speaker
start, there's massive case of hallucination where it would just like make up articles that sounded kind of.
00:18:12
Speaker
Right. So these are kind of larger questions about how people use technologies, adopt technologies. Companies don't necessarily give users a lot of education to think about what these things are good for and what they're not. And some of these things will get like ironed out. But specifically in the case of the global south, the concerns that come up in some of my work are one, in terms of
00:18:42
Speaker
whose cultures and whose opinions and perspectives get surfaced in and represented more than others so like as an example if for example you are trying to use life language models as a tool for information retrieval right so you're like give me a list of top 10 music artists
00:19:06
Speaker
in the world or the best music artist or any kind of generation of retrieval of cultural representation, for example. Most of the times that our language models are going to give you answers of artists and musicians from the Western world, right? Like if you ask it, explain what classical music is. So these are all like
00:19:37
Speaker
representational questions of, these are all questions of knowledge representation. You know, the answer that you will get are most likely classical music is going to be like Bach and Mozart as opposed to like Hindustani classical music, for example.
00:19:51
Speaker
So this kind of amplifies a lot of problems we've had with search in the past where when we use metrics like relevance and metrics like authoritativeness and metrics like who's gotten more clicks as a way to surface certain information, we do
00:20:13
Speaker
erase a lot of niche knowledge or knowledge of different other cultures who might not be as computationally active, where the cultures of with the histories and cultures of those people might not be as digitized as as others. So, you know, that's sort of one big problem that we have, like if this if these tools become more and more integrated into search and into
00:20:40
Speaker
knowledge production tools, whose cultures then get more represented, whose cultures get less represented. It's almost like, you know, previously
00:20:52
Speaker
there were many cultures that relied essentially on their content being available to the world via search for them to be represented in kind of any global discourse. And now they're even a step removed from being represented in the sense that like they must hope that their content is out there in the training data of certain algorithms such that when that algorithm is asked to kind of, you know, ask a question about
00:21:18
Speaker
something of relevance to that culture that their voice would surface from that process of the model predicting the next word. And so from the perspective of people trying to make themselves known, that process just become much harder. Yeah, and it's not just like
00:21:39
Speaker
being present in the training data, but also being present in the training data in enough numbers. So I think empirical abundance is a huge issue right now, where models are being trained as, you know, what you'd say like are kind of like greedy in some ways, where they will give you the most empirically obvious and empirically abundant token. Because, you know, it's kind of like almost think of it like a majority vote.
00:22:05
Speaker
If there are five possible associations with this one word, the one that the model will give you is the one that is the highest in numbers. In that way, you could actually be still represented in the training data. You could have, if I say, what's the best landmark or what are those five top architectural styles in the world?
00:22:29
Speaker
Gothic neoclassical architecture might have like a thousand hits, let's say, and I don't know, like indoor Saratonic Mughal architecture might have five. So the model will most likely go to the one that is the
00:22:47
Speaker
the thousand as opposed to the fives. And the other question is also not just a question of quantity, but also a quality question of not just whether you are represented, but how you are represented. So as an example, some research has recently shown how the internet is an uneven archive of our cultural history, who
00:23:11
Speaker
Who is able to tell their stories versus whose stories are told and whose perspectives are represented. So in the case of Flickr data, for example, someone did a really good study on how most of the images of African contexts that are in Flickr are taken by tourists.
00:23:31
Speaker
versus in Europe, most of them are taken by locals, right? So like, obviously, I think about it, like think about Pakistan, images, the photos that tourists will be taking of Pakistan versus the local be taking of Pakistan will be very different around like what is what aspects of Pakistan are represented. Similarly, Wikipedia, which is a huge data source right now for training models and is almost seen like considered like a ground truth data set, right? Because it's kind of considered the source of most
00:24:01
Speaker
human knowledge right now. In Wikipedia, there was countries within Europe and obviously in the US, articles written in those countries were mostly written, majority of the articles written from those countries were written in local languages, right? So like French for France, German for Germany, you know.
00:24:28
Speaker
as opposed to articles written about and within most of Africa, South Asia, East Asia were written in English. So don't think about this, right? Like if the most of the knowledge being generated on the internet about Pakistan is in English, like whose perspective is that representing? It's either like Western media or it's like a very particular class in Pakistan.
00:24:56
Speaker
So that problem is then present also in Wikipedia, right? So like if I go don't use a large language model, I want to figure out something about some cultural phenomenon in China. When I do a Google search, the answer I will get, you know, Wikipedia would likely be on the first page. And then I will learn about it from the Western perspective.
00:25:16
Speaker
Is your argument partly that large language models are making it even worse, in the sense that if I search hard enough, then I may be able to find on the third page of search, be able to find something that's coming from some kind of local perspective. But when I ask a large language model, it's only going to give me what it thinks the majority thinks, which will be most likely on this subject, kind of a Western view of how things are.
00:25:42
Speaker
Yeah, so take the example of, you know, image search, for example. Now, let's say you want to search, like, you know, like, yeah, something about Pakistani, like Bachai mosque. Now you'll see, or like, you know, Lahore, scenes from Lahore, I don't know. You'll be able to scroll through like hundreds of images and be able to get to one that you feel is relatively
00:26:11
Speaker
decent. In the case of, as you said, like generative models, you're even one step further removed, you're kind of dependent on what the model is generating and there's no way for you to
00:26:27
Speaker
unless you're really good at prompting, understand how to push the model into a very different stream of representation or a very different stream of association. And I think that that's the biggest problem right now that users have very little agency in what gets generated and how it gets generated. So prompting is one thing that people have been trying out where you kind of like tag on different words and different key terms to
00:26:56
Speaker
the model in the hopes that it'll give you something that you want. That doesn't obviously change the underlying dataset. So if something is not present in enough quantity and those associations haven't been built, it just will not be generated. So I, for example, try this out with your local cuisine or food. In Google Search, if you put in
00:27:22
Speaker
Um, you know, I was trying this out with like some like foods from the African context. So watch a, which is a really popular edition Ghana. If you put it in Google search, you get like dozens of hits. But when you try to ask image generation models to generate it, they don't do that well.
00:27:40
Speaker
Because that association just isn't present in enough volume for that association to have been developed. And there's no way for you to understand why this is happening. And that, I think, is something that a lot of people are not necessarily working on. How do we
00:27:58
Speaker
How do we stop guessing what the user wants from this prompt? And how do we basically keep all the generative control behind the interface versus giving users more control and allowing them to create better prompts through more
00:28:20
Speaker
educating them about prompt specificity. So one big example of this is there's been a huge conversation right now about, huge in our world, about stereotypes, for example, that text-to-image models generate. If you put in Pakistani men, you will get what people consider a stereotypical image of a brown man, maybe in some sort of religious garb. I don't know.
00:28:49
Speaker
If you put in like scene from Mumbai, you get like the scenes of like that look like they're right out of like slam dog millionaire or something. And so one way traditionally in which companies have tried to do bias mitigation or bias removal is automated classifiers.
00:29:08
Speaker
So what that means is they train the model to recognize hate speech, they train the model to recognize toxicity, they train these classified models to recognize skin tone, and then they say, okay, as the model is generating something, it passes by these classified checks. And if that passes the check, then it's shown. If it doesn't pass the check, it's not shown in a very crude rudimentary fashion.
00:29:35
Speaker
There's obviously problems in the classifiers also, because what is hate speech in the US is not really what's hate speech in Pakistan. The metrics that we're looking for, the axis we're looking for in the US, classifiers are mostly concerned with questions of race, whereas in Pakistan, race is not really the major axis of social discrimination and diversification. It might be class, it might be religion.
00:29:58
Speaker
It might be particular sects within religion. So social conflict is different in Pakistan versus the US. So that's one big problem. But beyond that, classifiers, think about how do you build a classifier for a stereotype? Stereotypes are so contextual. Whether something is a stereotype versus just a descriptor, changes based on who is
00:30:23
Speaker
using the stereotype? Where it is being used? Is it being produced in the West versus in Pakistan? Is it the only thing that's being generated versus it's part of a huge like other types of things that are being generated? So that's why right now traditional, because we have now started building technologies of representation and cultural representation, some of our traditional

Addressing Stereotypes in AI

00:30:47
Speaker
tools of bias mitigation don't work as well, right? It's the same thing as think of like, if you're writing a film, you will never be able to write one film that represents the entire world and represents the entire world in like extreme nuance. And so similarly, you're never going to get a model that is able to represent the nuances of Pakistani culture in all of its diversity and multiplicity.
00:31:16
Speaker
in a way that everyone in pakistan feels that this is representative right that's just never going to happen so that seems to be like part of the prompter's problem also right like if there is a prompter who goes to text image model and says give me an image that is typical for Lahore
00:31:34
Speaker
If you were to be asked that question manually, what would you do? You would apply your own biases to it. You would say, these are the neighborhoods of Lahore that I'm comfortable with, these are the kinds of things in Lahore that I really like, so I'm going to send them in the middle of that. Any kind of representation of a complex
00:31:54
Speaker
culture, complex geography, if it is summarized in one image or if it's summarized in one page, that's going to suffer from that problem, right? And I totally hear you on where is the line between description and stereotyping in that at some point, descriptors that
00:32:15
Speaker
the people who are being described that don't appreciate it or it harms them in some way, then they're going to be seen as harmful. And let's say there's a classifier that starts predicting that. In many cases, that could be an accurate description. And so in that case,
00:32:36
Speaker
I think I agree with you, and perhaps I would even more strongly advocate against a classifier approach, because I think in many cases we have seen how attempts to, let's say, rule out
00:32:52
Speaker
certain kinds of hate speech or certain kinds of misinformation just ends up privileging one culture over the other. Right. We'll see that in the case of, you know, the current Israel Gaza conflict in that, like in many cases, you see features that are meant to censor hate speech or banned speech ends up kind of like disproportionately affecting content from one side. Right. So we see that like any content that can potentially in a vague way be seen as praising Hamas
00:33:20
Speaker
is going to be kind of demoted, et cetera. And this is a cross-platform, right? I'm not talking about any particular platform or cross-platforms. This is going to happen. And so if we start implementing similar approaches in kind of responses to large language models or text-to-image models, I worry that we'll just end up with a hodgepodge that ends up even more so kind of representing the biases of
00:33:50
Speaker
the countries in which the makers of these models. Yeah, and I think that's the big thing that is tricky right now. A lot of these models are being built in companies that are traditionally engineering companies and computer science companies.
00:34:10
Speaker
And it's very difficult with all due respect to all the incredible engineering work that is happening in these companies. It's very difficult, I think, for a lot of engineers to be comfortable with that kind of social complexity and understand that this is not a problem that can be solved
00:34:35
Speaker
by engineering your way out of it. So going back to your earlier question of everyone brings their own cultural biases to answering questions of representation, right? So if I ask you as a like, can you describe Lahore to me, you will describe it in a very particular way. I will describe it in a very particular way. And I think like what's important to recognize is
00:34:56
Speaker
that these models are already bringing particular cultural perspectives to bear on such answers. But while I would recognize when I ask you that you are bringing a cultural perspective right now in the models, there is no accounting for that. And this cultural perspective is not just being brought in through data sets. As we talked about, these are primarily trained on data sets that represent Western perspectives. They're also being brought in through the context of annotations and data labels.
00:35:25
Speaker
of a lot of data annotators, even though they are within the global south, the way they are trained to label data.
00:35:33
Speaker
is through very Western perspectives. I want to pause you for a second there and I think this deserves a bit more of an explanation for our listeners in the sense that our understanding often of how these models are built is that there's just data out there that already exists and we can access it and it's usually free. And I think the reality is that often there is data in there that has purposely been labeled for the purposes of the model. Can you tell us a little bit more about
00:36:02
Speaker
that. Yeah so I think that's actually a really important point. So as you said a lot of us assume that there is you know these models are great examples of technological and automotive capabilities but as a lot of research you know including work from Mary Gray and others have shown us there is a lot of human labor at various scales at the data

Human Role in AI Development

00:36:28
Speaker
level that goes into training these models. So there are a couple of ways in which human labor annotator shows up in these models. So one is, of course, labeling data corpus. Another one that is now increasingly being used
00:36:46
Speaker
is you might have heard the term reinforcement learning through human feedback. That is basically where companies very quickly realize that they are wading into the
00:37:02
Speaker
Social world that is complex and you know, they can't build classifiers for toxicity and heat speech. So what they do is they hire entire like armies of data annotators and data labelers.
00:37:17
Speaker
And they show them generated text and they ask them to rate it on toxicity, on safety, on categories that you could give the raters because we recognize now that human judgment on a lot of these
00:37:35
Speaker
fuzzy social ideas is better than machine judgment. And then what the model does is it starts picking up patterns on what are the things that were called safe by human annotators and what were the things that were called unsafe by human annotators. And then in that way, you steer the model to just produce more and more of the safe responses.
00:38:00
Speaker
As I described this, you might be able to pick up on a huge problem here. The cultures of the annotators now matter a lot because annotators are bringing their own subjectivity to what is safe and what is not safe. So I want to take a step back and summarize some of the ways in which
00:38:22
Speaker
In some sense, you've already ruled out, at least in my understanding, some of the potential ways in which people are suggesting we improve these models.
00:38:36
Speaker
step that people think of is, well, we just need more data from, that represents the world in a better way. And so that means like, you know, both in terms of quantity and quality, let's represent different cultures of the world better, right? That's one kind of way of doing it. The other is this kind of reinforcement learning kind of paradigm in that like, if only there are enough humans that are in some sense like better representative of the world,
00:39:02
Speaker
in some manner then that kind of like give feedback to these models and in this case these can be these kind of like one size fits all like just like massive models then that's how we make these bottles better. Would you say that like there is any hope in these two larger frameworks?
00:39:20
Speaker
Yeah, yeah, so I think, yes, there is hope in the sense that all of these things will incrementally make models better and more representative in some ways. I think the bigger question really is,
00:39:37
Speaker
to what end, right? Like what is the problem that you're trying to solve? So if you're trying to solve the problem, let's say of the fact that when I put in bachai masjid in Dali, this is like a fake example, Dali does recognize bachai masjid, but if that's the problem, right? Like put in bachai masjid and it's not recognizing bachai masjid and just giving me some like random mask,
00:40:03
Speaker
the generic random mask, then yes, then one way to solve the problem is that I would just get more images of Bachai mask and that kind of like solves that problem. If you're trying to solve the problem of how a Pakistani man is represented, the solution or the end goal is unclear. The end goal could be that
00:40:32
Speaker
we want to create like as many diverse images of a pipe assignment as possible, in which case you have data diversification is one way to solve this problem. Getting like more human feedback is another way of solving this, well not solving this, like mitigating this is problem.

User Control in AI Content

00:40:53
Speaker
I think the biggest, the thing that for me these still don't do is
00:41:04
Speaker
Think about when we say we want to represent something better. What does that mean? Because that is still a value judgment, right? And as companies who are trying to serve billions of people around the world,
00:41:23
Speaker
you will never be able to create the right value judgment. And that's why I think for me the answer lies much more in trying to give users more agency.
00:41:39
Speaker
over what is being generated and how it's being generated. And one way that that is now being done is what people are calling fine tuning, which is the model has already been the base model and its base capabilities have already been developed.
00:41:56
Speaker
And on top of this, you can have users give like very, very small data sets. So like 500 images, let's say, hopefully at some point, even at 20 images and steer the model towards what you want to generate. The other thing to think about is model
00:42:16
Speaker
Models that are more specifically trained for a particular use case because what representational steaks are in a model that's been used for marketing material are very different for representational steaks in something that is being used to generate like scripts for
00:42:36
Speaker
Hollywood, for example. One of the things that came up a lot in some of the studies that we've done is people talking about what the model is generating almost feels like it's compounding the stereotypes and histories of representational bias that
00:42:59
Speaker
People in Pakistan, for example, have suffered at the hands of Western media and so that kind of becomes a feedback loop that models produce these images and then they are used to retrain the models and the models produce more of those images. It's fair that Pakistani voices will be lost out.
00:43:19
Speaker
in the crowd, right? And we'll only be able to see one very particular lens. For both of these, it seems that the end goal is essentially users having some kind of tailored model, whether it's this massive model that's been tailored by them, but some input from them, or a smaller model that's been built for their particular use case.
00:43:42
Speaker
But that's now tailored in a particular way to their context such that it does a much better job of representing the thing that they need representing. And so then, in some sense, the onus is on both the producer of this model and the user of this model to be mindful of where the problems are and how to solve them in some sense. At some point, I thought that you would go this route of,
00:44:10
Speaker
we expect too much from models, right? And that the question of representation is too complex and that we are in this world where we think we're actually taking some kind of artificial general intelligence seriously. And we are saying a model can, in fact, represent something super complex like a culture. And so one way is just we take a step back and say this is a field endeavor. Let's not expect models to do these things. Yes, models can make
00:44:40
Speaker
you know, ads about, I don't know, the monsoon season in Lahore, but they'll do a pretty bad job for the most part. And so, you know, we take them with a starting point and then we have, you know, a human creative person who goes in and kind of like, you know, makes the right edits and whatnot.
00:45:01
Speaker
But, you know, so you're not taking that extreme position. You're saying that it is possible within this kind of paradigm to, you know, it is possible in the sense of like we should strive for it. I don't think anybody knows what's possible and what's not right now in this space, but like we should strive towards either creating ways of users personalizing models to themselves by providing them
00:45:27
Speaker
with the right images of the right text or kind of this different enterprise of creating kind of like tailored or like bespoke models, so to speak. Yeah, I mean, I think like what I'm saying is
00:45:43
Speaker
I agree with your starting point representation is difficult and complex and so it is what I'm seeing is impossible to do is models and companies to guess what the representational outcome that every single user needs right so there's no way to do that.
00:46:00
Speaker
So now if we assume that that's not the case, which right now is not something that people necessarily recognize in a lot of these companies, right? Like the desire is still for the model to produce the best representational outcome. So, you know, let's even that right now is a big step.
00:46:19
Speaker
Do you think human beings can do that? Let's say instead of this model, in addition to this model, I have this model operator. Think of an elevator operator. This model operator who's sitting there who interviews you when you go and want to make a prompt. And then based on what they learn from you, they make the right prompts. So in this sense, using
00:46:40
Speaker
a model with an intelligent human being can solve this problem. And so I think a large part of the fear about, or the fear or the excitement about AGI, artificial data intelligence is this human plus machine combo can be replaced by just a machine. Just this model can go that well. So part of what I hear from you arguing is
00:47:04
Speaker
we can never imagine that. We can never imagine a machine, a model that is as smart as to understand all the nuances and complexities of what you want from them. And so therefore we always need this person sitting in front of the model, interpreting what's going on.
00:47:22
Speaker
I mean, I will never say never, who knows what the world of technology has in store for us five years from now, 50 years from now, 30 years from now. I think right now what we don't have models doing, and it seems like it's very difficult to do, is encode all of the complexity and the social nuance that representational questions require.
00:47:43
Speaker
encode all of that into this big decision tree and put it in a model. Think about all of the implicit judgments that you make as a human being when you are trying to understand representation. Even if you start noting them down, there are going to be so many that you miss and that are very crucial to you, only one human being.
00:48:04
Speaker
thinking about whether something is representative or not so that's very difficult for the for the model to do it so yes i think like the model can be you know an ideational starting point you can kind of like show you what's possible and then you and the model like in iteration kind of figure out
00:48:21
Speaker
the direction of representation that you want. So I think that's like a one good model. Another interesting model is what search does right now is when I put in a prompt. So in the case of search engine, I put in a search query. In the case of models, I put in a prompt that is what we call underspecified in some direction. So I put in Pakistani men.
00:48:45
Speaker
And that is a very, that's underspecified in the sense of like, what kind of Pakistani man, what is the Pakistani man doing? What is the Pakistani man wearing?
00:48:56
Speaker
What is the Pakistani man's skin color look like? So there's a lot of assumptions in the word Pakistani. Right now, the way what's happening is the model is just supposed to guess what you were meant by Pakistani man. So now imagine an interface that recognizes that you have put in an underspecified prompt and actually gives you some tags that you can put in. You can say, you know,
00:49:20
Speaker
We saw that you said Pakistani, but did you mean like these like 30 ethnicity groups in Pakistan? These linguistic groups right to start getting the user to think about the direction that even they wanted. You know this like implicit representational goals they had, which often we know we don't make explicit and clear when we're interacting with search engines or machines.

Enhancing User Involvement in AI

00:49:43
Speaker
Another model to think about is.
00:49:47
Speaker
Giving users more control over the direction in which the model is going through like think of it as you know some of the models right now have these levers where you can increase the randomness of the generation so I think of like a bunch of these levers so think of it in terms of like
00:50:08
Speaker
You know if you put in give me. Best best. Musicians of the world you can actually then have livers on the side that allow you to toggle.
00:50:25
Speaker
how diverse you want the results in different directions do you want that do you want more geographical diversity do you want more genre diversity do you want more gender diversity you know so again like kind of trying to elucidate what kinds of representations the the user wants and the user might not even know right they might not even know that they want these kinds of diversifications but
00:50:49
Speaker
In absence of trying to have that conversation with the user, the model will always guess. They will always default to the majority in perspective. They will always try to assume user intent, which doesn't always go that well. I find that very interesting, but it's almost like, I guess, the initial worries in the conversation about
00:51:16
Speaker
kind of defaulting to the majoritarian view. And it's almost like at that time I was thinking about, this is kind of like culture homogeneity in the extreme in that we end up with no trace of cultural differences. But this is interesting because this is both forcing the model
00:51:35
Speaker
to recognize that they don't make one prediction, make many, many potential predictions, be aware of those predictions, and then give the user control over which of those predictions they're interested in. And then make the user aware that, look, the question you're asking is actually far more nuanced than you realize, or maybe you have those nuances in your head, but you're not able to translate them into words right now. And so let's kind of like,
00:51:59
Speaker
lengthen this conversation with you having with this chatbot or this model. And let's get to the bottom of what you're interested in. That sounds really promising to me. More user control in this particular way. Yes, absolutely. And it seems very interesting to me also. I think the thing that
00:52:19
Speaker
I think one barrier to doing something like this is right now AI development is a very monocultural field in many ways, monocultural both in terms of geographical diversity, but particularly in terms of disciplinary diversity.

Interdisciplinary Collaboration in AI

00:52:41
Speaker
Unfortunately, questions of complexities of socio-cultural representation and the ways in which human beings bring to bear cultural values on technological development is not a huge conversation within the field because computer scientists and engineers are not necessarily trained to think about those things from their training. That doesn't mean that they're not smart. That doesn't mean they're not capable.
00:53:09
Speaker
they are right like these are like incredibly intelligent people with incredible skill sets but as with any disciplinary training their skill sets are limited to a very particular domain as we are building technologies that are entering the social world that are entering the world of representation that are entering the world of cultural production we need to expand the
00:53:34
Speaker
Disciplinary lenses through which we are solving this this problem so you and I can have this conversation we can sort of understand the complexities of representation because our university training and our pedagogical training has like forced us to think of all of these things.
00:53:51
Speaker
I didn't anticipate kind of ending up on this note of we should have more social science and humanities work in tech companies. But that's where we've ended up on. But this has been a fascinating conversation, Rida. I've learned a lot from you and I hope our listeners have too. Thank you for being here. Thanks. Yeah, thanks for having me. And I think, yeah, I had a great time and I'm hoping there is
00:54:19
Speaker
something that if there's something that users take away or listeners take away from this is like be skeptical of technologies, be skeptical of your own cultural values in all of your work and wherever you can if you are in a decision-making role for like a technological company like think about your cultures of development and your cultures of deployment that's my big you know spiel our cultures of development and our cultures of deployment are always going to
00:54:49
Speaker
how technologies are developed and used. And if we don't think about that, it's like a recipe for disaster. Awesome. Thank you, Ada. Thanks. I'd love to hear what you thought about this episode. Please leave a rating and a review on Google Podcasts, Spotify, Apple Podcasts, or wherever you're listening. This really helps the podcast reach the right audience.
00:55:19
Speaker
You can also email me at asadyakat at gmail.com with any feedback or any ideas you have for topics to cover or guests to invite. Hearing from people who are listening is a large part of what motivates me to record more episodes. So please don't hesitate to write. And thank you for listening.