She is pretty, he is smart: More on gender bias in MT with Dr. Eva Vanmassenhove

“Why are women beautiful and men intelligent?”

Does this sound like a stereotype? That’s because it is. And it is the question that opens this video by Dr. Eva Vanmassenhove of the University of the Netherlands on the topic of gender bias in machine translation.

Eva begins by pulling up a screenshot of a Google Translate result from Hungarian to English. Unlike English, Hungarian is a language that doesn’t have gendered pronouns.

The result is startling: “Someone beautiful is female, someone clever, male. A woman should do the dishes, while a man builds. A woman cleans, cooks, raises children, and a man does research, teaches, and makes money.”

These results show the extent to which gender biases are present in machine translation. But what exactly is gender bias, and why is it such a pervasive problem? The video is a well-detailed and highly informative introduction to the topic, and well worth the watch.

A few questions for Dr. Vanmassenhove

How does gender bias show up when it comes to machine translation?

Gender bias is mainly visible in MT when you are translating between languages where the one language doesn't require you to make the gender explicit while the other one does. For instance, English is a language where there are barely any gender markers (except for referential "he", "she"...). When translating sentences such as "I'm beautiful" or "I'm a nurse" into other languages (e.g. French, Spanish, Russian) the MT system will pick a gender for you and this will often result in biased translations.

Why is it important for us to address gender bias in MT?

It is important because we are leaving out translations that are equally valid from a linguistic point of view. Furthermore, there is no way to tell the system which gender is preferred. As such, when translating a speech of someone who identifies as a woman/man, the translation will most likely only partially be correct (as it will sometimes use the masculine and sometimes the feminine gender depending on the data it was trained on).

Additionally, MT is sometimes used for downstream tasks (example is given in the talk as well, for instance when you are looking for a "plumber" or a "nurse" and you use an MT system, half of the candidates will be eliminated).

What are the ways that we can address gender bias in MT?

There is this cool project 'Fairslator' (www.fairslator.com) that does a pretty good job at providing multiple translations. Main challenges are that languages differ considerably when it comes to gender agreement/markers. As such, there is no one-solution-fits-all.

It furthermore requires quite some linguistic knowledge to be able to tackle gender in MT. People have experimented with adding additional features that incorporate information such as gender to teach the system how to generate the right translation given a particular context. Others have focused on debiasing word embeddings (although this doesn't really resolve the issue for MT specifically; it attempts to make the data more fair).