WAYS WE NEED TO #BREAKTHEBIAS IN THE VOICE INDUSTRY
The International Women’s Day website opens with the following request:
“Imagine a gender-equal world. A world free of bias, stereotypes and discrimination.”
It’s a world we all wish to be real – and a world many of us, regardless of industry or role, are working towards on some level.
Vixen Labs has always been either equally or majority female-staffed, and our Senior Leadership Team is also majority women. So while we don’t play into the tech startup stereotype (hashtag hustle bros), we know that our composition is just the first step. Rest assured we’re working hard behind the scenes to make Vixen Labs more diverse and inclusive.
What we can’t forget is the bigger picture. The Voice industry has a number of entrenched biases that only exacerbate some of the inequalities in wider society. We spoke to our co-founder Jen Heape to get her views on these biases – and what the next steps are to break them down.
#BREAKTHEBIAS – VOICE RECOGNITION
Men have “always been taken as the standard human being”, says Invisible Women author Caroline Criado Perez in her astounding interview with Vox’s Sigal Samuel. This applies to many systems in historic and modern life: from car seat belts and treatment of pain to – yep, you guessed it – voice assistants.
“THE WAY IN WHICH SPEECH AI WORKS AT THE MOMENT, IN TERMS OF LANGUAGE PROCESSING, ACTUALLY EXCLUDES WOMEN AND OTHER MARGINALISED GROUPS ON CERTAIN LEVELS. FOR EXAMPLE, IT WON’T PICK UP CERTAIN CHOICES OF LANGUAGE SYNTAX QUITE AS WELL AS OTHERS.”
Put simply, speech recognition doesn’t work so well for women. It works even less well for women who aren’t white or well-educated.
There are a variety of reasons for this: difficulties in acoustic modelling, fewer women working in modelling, a lack of female voices being fed into AI as data sets to learn from, and other related possibilities.
Regardless, the path to breaking this bias is with increased intersectional diversity in voice tech: women of differing socioeconomic backgrounds, educational status, race, nationality, and many more demographics.
HELPING MACHINE LEARNING TO, WELL, LEARN
After a few frustrating interactions, marginalised users will become aware that their accent, pitch, or choice of words is, for whatever reason, not being picked up correctly by speech recognition.
This awareness will lead to one of two behaviours: either the user stops attempting to get the assistant to understand them, or they modify how they are talking.
The first behaviour – ceasing engaging with an assistant – is a poor outcome for all involved, of course. The user doesn’t get the information they were seeking. The assistant doesn’t complete the reason for its existence. If the user was attempting to engage with a branded experience, such as an Alexa Skill for placing a food order, the company misses out on a transaction.
However, if a user chooses the latter behaviour (modifies how they are talking) they might still complete the interaction. But this modification comes with costs, too.
“You start to amend your speech patterns purely to fit a model with which you are trying to engage,” points out Jen. This rings true – who among us doesn’t have to enunciate differently to get a voice assistant to understand them? (Well, the white middle-class men among us, we imagine.)
“YOU START TO AMEND YOUR SPEECH PATTERNS PURELY TO FIT A MODEL.”
Speech modification isn’t just problematic in theory. It has implications for how voice assistants develop.
“Speech modifications mean that models don’t learn from your natural way of speaking, so they enter a feedback loop of reaffirming a certain way of speaking,” says Jen. The machine can’t learn as effectively, which is a problem given we’re discussing machine learning.
AI reaffirmation loops isn’t an issue we can solve in a blog post. Where we can start is by recognising that user datasets aren’t free of influence. People are constantly amending their choice of words and how they speak to reach an outcome.
And it’s not just the machines that learn from speech modifications. Users do, too. Say a woman’s instructions to a voice assistant are met with errors, but her male partner’s are understood and carried out. Will the woman gradually give up on using technology designed to make her life easier? Will she feel less confident using the latest devices? Will she decide to speak up less in day-to-day life?
These are all tricky questions, but it’s not far-fetched to posit that as users are rewarded for speaking in certain ways, their natural evolution of language (and other behaviours) is affected.
CAN WE #BREAKTHEBIAS OF ASSISTANT GENDERING?
The popular historic argument against digital assistants was that they reinforce gender stereotypes.
“Siri’s ‘female’ obsequiousness – and the servility expressed by so many other digital assistants projected as young women – provides a powerful illustration of gender biases coded into technology products”, wrote Mark West, Rebecca Kraut and Han Ei Chew in an open-access UN publication on gender divide.
The traditional response to this criticism was for male voice options to be created. Building on the efficacy of choice, Apple’s Siri removed any default assistant, requiring a user to select the voice they wanted to hear.
But this remains problematic: what goes through users’ minds when they choose a female over a male or vice versa? And why is tech reinforcing the gender binary which we know to be harmful given identities which don’t fit into a stereotypically male or female box?
Voice experience designers face a range of similarly fraught questions. Does one attempt to purposely subvert societal expectations? Or fit in with them to aim for cohesiveness, or to draw people’s attention to this issue?
A couple of years ago the creation of a supposedly genderless voice hit headlines. Sadly this isn’t a solution to all of the gendered assistant debates.
Jen explains: “However much we may feel that a female-gendered voice assistant is problematic, a supposed ‘genderless’ voice is – due to the way different people personify voices in different ways – almost impossible.”
Apple is currently rolling out Siri’s genderless voice. Unofficially known as Quinn, the voice was originally recorded by a member of the LGBTQIA+ community. As Sarah Perez wrote for TechCrunch, “You may end up hearing Quinn’s voice and decide it sounds a bit more female or male to your ears. Though if you set your mind to hear it one way or the other, your interpretation may change to reflect your thinking.”
It seems that assistant gendering is one challenge of #BreakTheBias which is much bigger than the technology depicting it.
“The bigger issue here is the way in which people will attribute certain behaviours and tasks to a male or a female voice. This is about our subconscious gendering of roles, societally,” agrees Jen.
“THIS IS ABOUT OUR SUBCONSCIOUS GENDERING OF ROLES.”
Again, Vixen Labs isn’t able to solve entrenched societal beliefs in a blog post. We can try to mitigate their impacts, though. We can commit to continual challenge, asking the big questions in design processes and user choices. We can try to shine a light on the biases and inequalities – as we are attempting to do here.
The impossible interdependency wrapped up in these discussions (technology influencing society, society influencing technology) can result in despair or surrender. Alternatively, it can be a call to action to each and every one of us: to do what we can to #BreakTheBias anywhere we can. In voice. In tech. In our everyday lives. And in ourselves.