Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
testing upload - turncating? image

testing upload - turncating?

3.3 Prod Up!!!
Avatar
9 Plays2 years ago

testing upload

Transcript

Podcast Introduction

00:00:11
Speaker
Welcome, everyone, to the next episode of the Search Off the Record podcast. Our plan is to talk a bit about what's happening at Google Search, how things work behind the scenes, and maybe have some fun along the way.
00:00:24
Speaker
My name is John Mueller. I'm a search advocate on the Search Relations team here at Google in Switzerland. I'm joined by Martin, who's also on the Search Relations team. Our special guests today are Zineb and Bruno, also from Google.

Importance of Inclusive Language

00:00:37
Speaker
Today, we'll be talking about inclusive language. It's a topic that has become more and more top of mind in the recent years, so I'm excited to find out more. Thanks for joining us. It's great having you guys here today. Hey, John. I'm not a guy.
00:00:50
Speaker
Oh, of course. Sorry. I should have known better than to say you guys in general. Thanks for the reminder. Anyway, Zineb, it's great running into you again. It seems like an eternity ago that we worked together as webmaster trends analysts to help site owners in the forums and at events. What have you been working on nowadays? Can you introduce yourselves for our listeners?
00:01:14
Speaker
Yeah, sure. So I'm now a program manager. I work with the Google Assistant team and more specifically on the infrastructure side. And Bruno, what are you working on? Hey, I'm Bruno and I'm a linguist working on Google search, especially on query understanding.
00:01:30
Speaker
Yes, and Bruno and I both collaborated in a working group around inclusive language with a specific focus on French for now. Ooh, that sounds interesting. Inclusive language. Can you explain a little bit what that means?

Understanding Inclusive Language

00:01:44
Speaker
Yes, sure. So by inclusive language, we mean a language that is actually free from words or phrases that are stereotyped or discriminatory. So for a very long time, and in many languages like French, for instance, the masculine form was the default expression in any context.
00:02:00
Speaker
So for example, people would say during the executive meeting, chairmen are leading the discussion, but this could be a chairwoman, but the masculine would actually be used. So recently we've seen a very strong movement where people are being more inclusive when they speak and when they write, and they try to really honor and embrace all of the different identities out there.
00:02:24
Speaker
So for example, in English, we would either make sure to mention the gender when we speak, for instance, how to become a policeman or a policewoman, or we would use a gender neutral form like how to become a police officer.
00:02:38
Speaker
To give you a non-gender related example, because this is not only about gender, we also try to be mindful of some expressions that are not inclusive of culture, race, ethnicity, or the actual state of the world. So for instance, in tech, for a very long time, we used expressions like blacklist or whitelist, and sometimes we still do because those are habits that are hard to change.
00:03:00
Speaker
But we should prefer using more inclusive terms like a block list or an allow list. And at Google, we see more and more engineers doing that actually. Another example, you would often hear people say in the minority group when referring, for instance, to a smaller group than the group that is visually bigger in the room or in the country. But
00:03:25
Speaker
we should prefer using something like the minoritized group or even the underrepresented group. So those are only a few examples. Sounds like we really have to shift our focus a little bit. But why is this important for Google Search?

Google's Steps Towards Inclusivity

00:03:39
Speaker
Yeah, actually, one of the three value of Google is to respect the user. So we really want to make sure that all our users are represented, regardless of their gender, their sexual orientation, or even their beliefs. And gender biased has always been a very, very strong concern for us.
00:03:58
Speaker
If you remember a long time ago, image search was showing only image of women where you were looking for a nurse, for example, and if you were looking for doctors, you would have only image of male doctors. So we've been actively working on changing this bias and being more inclusive in how we represent the two genders.
00:04:22
Speaker
but there are still a lot to do, especially in terms of inclusive writing. So what is inclusive writing then? So inclusive writing is all kind of practices or style that we are using to make sure that the readers feels really including when reading, regardless of their gender, their ethnicities or their socio-economical status.
00:04:47
Speaker
In many languages, other than English, it's rather hard to represent a gender neutral term. So as Zined mentioned, police officer or fire person are gender neutral term that don't refer particularly to a man or a woman.

Examples of Inclusive Writing

00:05:06
Speaker
But in other languages, when there is no such a gender neutral term, the trend now is to always refer to both term. So using, for example, firemen and firewomen when you refer to the fire person, or to use a contracted form with a specific character. Specific character. Could you give us an example or multiple examples of that in different languages then?
00:05:33
Speaker
Yes, sure. I can give you some examples in English to start with. As Bruno mentioned, we would say fireman or firewoman or a fireperson. But then you have also more generic examples like the use of the word mankind, for instance. We use it all the time, but we should prefer using humanity so that you don't have man in that word.
00:05:58
Speaker
Another example that many people probably use, that I use as well, but again, it's a mental training all the time. In the office, we often tend to greet our teammates by, hey guys, how are you guys? Just John at the beginning of this podcast said, hey guys. But we should prefer using hey folks or you all, which is more neutral.
00:06:22
Speaker
Another example as well is, for instance, the term Latino refers to men from Latin America, but we used to use it to refer to all of the Latino community. But Latina actually refers to women, right? So the current trend today to include both genders is to use Latin X, Latin X within the letter X at the end.
00:06:46
Speaker
Yeah, so those are just a few examples in English, but maybe Bruno, you would like to add some more examples in other languages. Yeah, absolutely. In French and in German, there is more and more distendency to abbreviate the feminine and the masculine form together. So, for example, instead of saying etudeant, so student, in the masculine form, and then to add and etudeant, so the feminine form, we would just contract everything together.
00:07:16
Speaker
with a special character. So it would be Etudeon, followed by a special character, like a dash or a slash, and then the final ending E, which is the feminine form. And more recently, we witnessed an increasing use, especially in French, of a very special character, the middle dot, which is called in French, the poi medium.
00:07:39
Speaker
You can now find it on Google Gboard on mobile phone by long pressing the dot key. I see something similar in German as well, where they also have the slash, and I've seen some use a star as well for the ending. In practice, is this something that you see is widely accepted?

Middle Dot in Writing

00:08:04
Speaker
For the Poimédian, well, it's actually a very hot topic right now, especially in France, but we do witness an increasing use of the Poimédian. I do see it more and more on social media, like from friends or acquaintances who are using it. I don't know if I'm paying more attention to it or if people are actually using it more.
00:08:22
Speaker
But in any case, I think that people have different ways of writing inclusively. And I'm going to say that it's not up to us to Google to basically decide and say whether people should use the middle dot, the normal dot slash the star or the hyphen. But I think
00:08:40
Speaker
It's our responsibility to make sure that our user, our Google users, can use any form that feels natural to them and that we will still parse and understand their content as expected. Cool. So it sounds like Google has been doing some things already on this front. Do you have some examples of what has been done so far?
00:08:59
Speaker
Yeah, absolutely. So in our working group, the first action we took was to work with the Gboard team to make this Poimédien, this metal dot, more accessible before it was hidden in a second or third layer of the Gboard. And now we can just, as I said, just by long pressing on the normal dot, you would have access to this Poimédien and then easily write it if you want.
00:09:29
Speaker
But we also started working on the more infrastructure side with talking with the search ranking team to see how this new inclusive writing practices are handled at indexing and ranking. Ooh, that's cool. So how does it actually work in search right now?

Challenges in Indexing

00:09:49
Speaker
And how, you know, algorithms dealing with the middle dot on web pages these days?
00:09:55
Speaker
That is an excellent question, Martin. Thank you for asking, because I wanted to ask it to you, actually. So, to you and John, basically. So if a website uses etujent.e, like student with the middle dot in an E on their page, can it be retrieved as well for users who search for etujent in the masculine form and etujent in the feminine form?
00:10:19
Speaker
Oh, that's a good question. I don't answer ranking questions. Maybe, John? Oh, man. Way to throw me under the bus, Martin. So I don't know what the current details are in Search, but I do know it's something that the team is working on. From my understanding of Google systems, there are probably three aspects that are involved in this whole thing. So on the one hand, there is indexing.
00:10:47
Speaker
So we try to extract the words from the individual documents that we find on the web, and we keep them in our index. And that way, we can try to find the documents that match specific queries. And one option, or perhaps one part of the solution, could be to automatically expand on some of those words into the appropriate versions, maybe drop the punctuation if that's not a natural word boundary, but actually a sign that what is meant here is they're different versions.
00:11:17
Speaker
And similarly, we could maybe automatically expand on the versions that we do find in the documents. And in our index, keep both of these versions, or whatever versions are appropriate to keep. The second aspect is more about ranking, which I guess is the ranking question, which is all about the serving side of things. So when people enter a query, what happens on Google's side?
00:11:44
Speaker
And we've seen from the previous episodes of the podcast that we do automatically kind of expand the query that we see based on known synonyms, abbreviations, different versions of different words. And in practice, these systems tend to run automatically. So when we see that new synonyms are being picked up and used by users, we try to pick that up and use that automatically.
00:12:09
Speaker
because there's no way for us to keep up with what people are searching for, and people search for different terms all the time. So that's another place that could come into play. And third, one place that I think we don't talk about that much is everything about understanding entities,
00:12:28
Speaker
both in the content and in queries and the individual attributes that are assigned there. So for example, we know the Eiffel Tower is a structure, and it has a certain height. You can ask Google how tall is the Eiffel Tower. For other entities, the gender may also play a role. And all of this comes initially from the Knowledge Graph, and it tends to be built up automatically based on the content that we find online.
00:12:53
Speaker
However, that doesn't mean it's automatically always correct. For example, if you ask, who is the second lady of the United States? Well, the role is still the same, but it's different now because the US vice president is a woman and she's married to a man. Therefore, the title is now different. And this kind of bias needs to be improved both in our languages and, of course, in Google systems.
00:13:19
Speaker
That makes sense. Also, what I think makes this a little trickier is we mentioned a bunch of different ways of doing this and a bunch of different ways of doing inclusive writing. What would you suggest folks are doing when they create content online?
00:13:34
Speaker
Well, right now there is no official recommendation, and I also don't think it is our role as Googlers to say how it should be written, but we do see content being written using various forms, like various punctuations, middle dots, stars, slashes, parentheses, using the neutral form, mentioning all the genders in a sentence, etc.
00:13:56
Speaker
I think that all of them are valid as long as they make readers feel included. What's your thought, Bruno? Yeah, I think it's also really dependent on what you are writing. It's true that if you are writing full text to be read, a special character might be hard for the reader to read.
00:14:17
Speaker
Although I have to say that there is no clear psycholinguistic evidence that a new special character is harder than any other abbreviation that you haven't seen before when you read a text. But if you are making lists of very short content or form or headers of section, you can really start using a condensed or abbreviated inclusive writing because it makes everyone feels more inclusive.
00:14:45
Speaker
I also like to mention this example in France, the ID card, as always the mention born in, where the place of birth for the people and born in is abbreviated nay with an e in parenthesis. So there is an inclusive abbreviation form to agree born was female or male, depending on the holder of the ID card. So in any type of forms, you can use this kind of contents writing.
00:15:13
Speaker
But again, we are not in a position to tell what people, how people should write. The only recommendation we can say is write in the most natural way that is the most natural to you and that your text is easily readable.
00:15:31
Speaker
Cool. Now, I imagine if people write naturally, then they would automatically include synonyms and variations as well anyway. So some of that probably plays in, I don't know, search as favor in that when people write naturally, they don't always just use the exact same word over and over again. You provide some variety.
00:15:53
Speaker
But all of this sounds very focused on written content. And more and more, voice search is a thing. Would this also work for spoken information? So if I went to my Google Home and asked about fireman jobs, would it be able to recognize that on a page? And how would that get pronounced?
00:16:14
Speaker
So the pronunciation of an abbreviation is always a tricky part for human beings. When we read a text with an abbreviation, we don't necessarily always know how to read it. MR being read as a mister is something that we have to learn.
00:16:30
Speaker
The middle dot is interesting because, again, it's an abbreviation of the two forms, the masculine and the feminine. So naturally I would pronounce it if I see etudeon.e, I would pronounce etudeon and etudeont, so the two forms.
00:16:48
Speaker
But for computers, for text-to-speech systems, it's of course a little bit trickier, because again it's an abbreviation that we need to handle, but that can happen in any terms. So we are working on it, but of course it's a very long longer term project.
00:17:07
Speaker
It was really cool to hear that assistant and other voice systems are working on that. But since we are search of the record and we are here to talk about search, how should search engines handle this in your opinion?

Future of Inclusive Writing

00:17:20
Speaker
In my opinion, I would say that search engines should adapt to the way people write online. So as Bruno mentioned, if it feels natural to the content owner, then search engines should adapt to that. So there is a clear trend, at least in French, for adopting inclusive writing online and as
00:17:38
Speaker
search engines, we need to make sure that it is supported by our crawling indexing and ranking systems and that we also show inclusive results. So for instance, it would be great if kids when they search for doctor or a football player, a footballer in French or a tennis player or engineer, it would be great if they see a variety of search results so that both little boys and little girls can identify to those professions, for instance.
00:18:07
Speaker
Yeah, we have seen that as a challenge in the past. And is that something that Google search teams are aware of and are working on already? Or is that something that we need to raise more awareness internally for? So we recently started with our working group to get in touch with different teams that can get involved. And this is actually how we get in touch with you, John and Martin. So this is just the beginning. We know it's a very long endeavor to make
00:18:36
Speaker
all little bits, parts of the system more inclusive and more sensitive to inclusivity. But we are hoping to solve the bigger pain point very soon, both to get content creators confident that they can write the way they want, but also our consumers to read or their consumer to read the content as they like.
00:19:00
Speaker
Cool. Now, it sounds like there's still a lot of work. Would it be a positive thing if search engines just understood the one most common way of doing this, or would they need to support all of the different forms?
00:19:12
Speaker
I think that if they support the most common forms of inclusive writing, it's already a great thing. They would be phenomenal. And over time, I think in a couple of years or so, I wouldn't be surprised if we see that each language ends up having their own inclusive writing common practices or agreed upon practices and that most writers would be following.
00:19:33
Speaker
But for now, I would say starting with the most common ones that we see online is enough. And obviously this is not a one of change. Language is dynamic, languages are dynamic and search engines should still take this into account and they should still take into account any new inclusive writing form if that form becomes prominent over time.
00:19:53
Speaker
So do you think over time there will become a single way of doing inclusive writing? So maybe people should just wait until there's one common thing, or maybe even one common way of doing it across all languages, or how do you see that evolving?
00:20:08
Speaker
Though I really think each language is different, as we highlighted before, in some languages it's really easy to have a neuter form like fire person, but in some others this neuter form doesn't exist and is not possible at all, so that's why there is this need to have a special character or abbreviation to take into account the two male and female versions of a term.
00:20:34
Speaker
Now about, is there one common way to write inclusive writing? I think it's, as Zineb said, language are very dynamic, and we see now a trend about using inclusive writing, but the usage of which a provision to use is still quite inconsistent.
00:20:58
Speaker
But we can hope that the more people use it, there will be a large exception of one in favor of another, for example, or a kind of a standardization. In history of languages, this is how it happens all the time. You have hesitation about using one term over another, and the more it evolves, the more there is a kind of a standardization across the language community.
00:21:24
Speaker
You can take, for example, the gender neutral pronoun they. In English, you think they to refer to one person in a neutral way has been introduced or make it more and more frequent, maybe like 30 years ago. And now nobody's questioning the use of they, but 30 years ago, some people were questioning it. So it's just a question of evolution over years.
00:21:53
Speaker
Cool. So it sounds like people shouldn't wait. They should jump in and use inclusive writing, even if they're not 100% sure which version is the one that will become maybe more popular over time. They should already start doing these things. Is that about correct?
00:22:13
Speaker
Yeah, absolutely. OK. So all of this still feels a little bit complicated to me, because probably I'm just used to the way things worked until now, or the way that I saw things working.

Handling Inclusive Content in Search

00:22:25
Speaker
But it's been really insightful. So let me see if I got this right. Ideally, search engines would automatically show a diverse set of content across all different aspects. So if you search for something like Fireman, it would automatically include different versions.
00:22:40
Speaker
including websites that mention firewoman or fireperson. And similarly, search engines should be automatically able to use the different common forms of inclusive writing when they're found on a website. So the goal should be that these pages are automatically processed and accessible for any searches that are related to that.
00:23:01
Speaker
It sounds like it might be tricky, given there are so many different ways of doing inclusive writing, even when just looking at one language. But just because something is hard doesn't mean it's not worthwhile doing. And, again, ideally, site owners, they should just
00:23:18
Speaker
Use inclusive writing techniques whenever they're appropriate. And people like you or listeners should continue pushing search engines so that it kind of keeps working better. Because without this gentle nudge across the board, I think it's very easy also for search engines to say, well, it seems to be working OK. Maybe we don't need to change anything. But perhaps we do need to change some things.
00:23:45
Speaker
Is that about a reasonable summary? Yes, very good. Thank you, John. I would add just one thing that is obvious, but it's worth repeating. Write for your users, and your users are diverse. You don't only have an audience of male or female readers, so make sure that everyone feels included when they read whatever you write online. Cool.
00:24:04
Speaker
Well, I thought this has been a super interesting topic.

Conclusion and Resources

00:24:08
Speaker
And it sounds like there's a lot more information available outside for people who want to dig in. Do you have some examples that we could point people at? Yes, we'll post some links in the description of this podcast. But I would start with the authoritative websites. Basically, the UN has a few pages about inclusive writing in various languages. And then there's also the Wikipedia page to explain what inclusive language is. And we'll add more links.
00:24:34
Speaker
also. Cool. Okay. We'll try to drop those links in the description so that people can take a look if they're interested in finding out more.
00:24:44
Speaker
Cool. Well, that's it for this episode. Thank you for joining us here, folks. It's been fun with these podcast episodes. I hope you, the listener, have found them both entertaining and insightful as well. And regardless, let us know how you're liking these. If there are topics that we could be including in one of the future episodes, feel free to drop me a note on Twitter or chat with us at one of the next virtual events that we go to.
00:25:08
Speaker
And of course, don't forget to like, subscribe, and update your links. Thank you, and goodbye, everyone.