December 13, 2022

MT gender bias is more than a technical problem: An interview with Beatrice Savoldi

Gender bias is a very tricky problem in MT development. But it’s more than just a technical problem; there are ethical and sociological aspects to it as well. As such, people who are involved in developing MT systems need to adopt a more holistic perspective toward the problem of language and gender bias in machine translation.

This was one of our basic takeaways from a recent journal article on the topic by Beatrice Savoldi, a PhD candidate at the University of Trento and at Fondazione Bruno Kessler, and her colleagues. Beatrice has very kindly given us her time to talk about what they had written and to answer a number of questions about gender bias in machine translation.

You can access their article here: Gender Bias in Machine Translation.

How does gender bias show up when it comes to machine translation?

In a cross-lingual application such as Machine Translation (MT), gender bias emerges more blatantly when dealing with languages that encode gender information differently. Whereas English expresses gender on pronouns (e.g., he, she or they), some languages such as Hungarian do not and are almost “genderless”. On the opposite side of the spectrum, Italian or Spanish have a complex gender system, where feminine and masculine markings appear on nouns, articles, adjectives or even verbs. Thus, for an MT system, translating gender can imply a one-to-many problem when confronted with the translation of a word such as professor in Italian: should it be professore (masculine) or professoressa (feminine)?

In this scenario, MT systems tend to favour masculine forms in translation, except when troublesome stereotypical associations come into play. Then, feminine gender might be generated in association with less prestigious occupations or activities (she cooks, he works), or even controversial descriptors (clever professor as masculine, but pretty professor as feminine). Such behaviours are not limited to ambiguous translation and can impact the representation of explicitly gendered referents, too.

Gender bias in MT can be witnessed by any user, for example, if they rely on commercial systems available online. Not only; perhaps with more pernicious effects, when reading automatically translated webpages or social media posts we might be presented with biased gender assignments and not even realise it.

Why is it important for us to address gender bias in MT?

Language is a powerful tool that can enable the visibility of social groups, but through which we also reveal prejudices and propagate stereotypes. Thus, given the ubiquitous role that they occupy in our daily lives, concerns over the appropriate use of gendered language have extended to language technologies, too.

Indeed, gender bias in MT leads to feminine under-representation and yields a reduced quality of the MT service offered to women. And if we account for non-binary language and individuals, they are completely absent in current MT systems.

Note that biased MT outputs do not merely reflect our own societal biases, but can actively reinforce them. Such is the case of stereotypical translations, which can feed into existing prejudiced assumptions and negative generalizations (e.g., only men are qualified for high-level positions).

As MT is increasingly deployed at large scale and under different scenarios -- ranging from social media to work-related activities or even for legal texts -- gender bias has the potential to impact a wide array of people. Additionally, since MT output can be used as textual data to develop future language technologies, biased language will be fed into and propagated by the next set of models as well.

Can you give us a quick overview of your findings from your research work, "Gender Bias in Machine Translation"?

The paper is a literature review of current studies on the understanding, assessment and mitigation of gender bias in MT. First and foremost, the review exhibited how this rapidly growing area of research had been characterized by disparate, technically-oriented efforts, based on occasionally incompatible and narrow conceptualizations of gender bias.

Since gender bias in MT is a sociotechnical problem, in the paper my colleagues and I systematized existing work on the topic, also informed by relevant knowledge and notions from neighbouring fields, like sociolinguistics and the social sciences. Also, to counter the widespread but limited view that singles out gender asymmetries in the training data as the ultimate cause for bias in MT, the paper overviews several other constraints and factors that contribute to its emergence.

Finally, we found a series of blindspots and challenges for gender bias in MT. Among them is a trajectory towards the inclusion of non-binary gender and language, which is now growing. Also, we underscored how gender bias in MT has been largely limited to text-to-text systems. The implications of gender bias for other non-textual modalities, however, might be different. Indeed, in our work, we found that speech-to-text systems exploit speakers’ vocal characteristics as a gender cue to improve feminine translation. However, relying on physical gender cues (e.g., pitch) for such a task implies reductionist gender classifications making systems potentially harmful for a diverse range of users.

What are the ways that we can address gender bias in MT? In your own practice, what have been the challenges?

As mentioned above, different mitigation strategies for (binary) gender bias in MT have been put forward. Some of the most popular approaches imply injecting external knowledge into the model (e.g., speaker’s gender) to guide translation. Others, instead, attempt to prevent models from learning stereotypical associations at training time by creating more balanced training data. For ambiguous queries, also, an additional rewriting step can be applied to an MT output, so as to always obtain both masculine and feminine translation alternatives.

Generally, and also common to my own experience, one of the main obstacles of debiasing approaches is that of improving gender translation, while preserving overall translation quality. For instance, we have experimented with different segmentation techniques applied to the decoder side of speech-to-text systems. When segmenting at the character level, we found that translations were slightly less fluent than the state-of-the-art, but feminine translation improved. It would thus be relevant to assess end users’ own perception of this apparent trade-off.

Besides technical interventions on the model, however, a set of best practices must guide the whole creation pipeline of MT. These include dedicated test sets, evaluations, and careful annotation practices. When I first started working on the topic, there was no benchmark available to measure bias on naturally occurring instances of gender translation; only simple, synthetic corpora. Hence, we created the MuST-SHE benchmark, a multilingual, dedicated resource for more realistic testing conditions. As natural data are complex and gender translation language-specific, its creation required extensive manual work for data collection and annotation. Right now, my colleagues and I are facing similar challenges in the creation of a test set for gender-neutral translation, devoid of gender marking (e.g., service instead of waitress/waiter). As parallel data are lacking and gender-neutral language is highly language specific, we need to manually identify relevant gendered cases, and generate a viable neutralized translation.

This article is part of our series of expert interviews on the topic of gender bias in machine translation. Check out the other parts here:

Dealing with gender bias in MT: An interview with Manuel Lardelli & Dr. Dagmar Gromann
She is pretty, he is smart: More on gender bias in MT with Dr. Eva Massenhove