July 12, 2022

Meet VALHALLA, the Machine Translation Model That Hallucinates

Have you ever played around with DALL-E mini?

It’s currently one of the internet’s favorite toys. All you have to do is type in a prompt—say, for example, world-renowned chef Gordon Ramsay as a clown—and the machine will generate a set of images showing what you input.

machine-translation

Fun, isn’t it? What if we told you that similar technology is being used to develop better machine translation?

That is what researchers at MIT-IBM and UC-San Diego have done with VALHALLA (Visual Hallucination for Machine Translation). This is a new machine learning model that uses “hallucinated”, or automatically generated images alongside input text in a source language to translate into a different language.

What VALHALLA does is a form of multimodal machine translation (MMT), which means machine translation that draws upon multiple sources other than the source text. In this case, images are used in an intermediate step between text input and the output in the target language.

An example the researchers provided is the sentence “A snowboarder wearing a red coat is going down a snow-covered slope”. Instead of just translating from English to another target language, the machine is also trained to generate an image of this snowboarder as described, against which the translation is compared.

Research has already been done on other kinds of image-based MMT to improve the quality of a translation, but these models need ground-images during both the training and testing phases to work. VALHALLA is the first to use images that are automatically generated, thus getting around this particular limitation.

The results have been surprisingly positive. The researchers found that VALHALLA showed improvements over pure text-based models, particularly in less common language pairs.

Models like VALHALLA are still experimental, and it’s unlikely we’ll see industry-wide applications just yet. We’re curious to know how efficient it can get compared to pure text-based systems that are in industrial use today. Its performance on increasingly abstract sentences, that may not necessarily be representable in images, also remains to be seen.

For our part, we believe that current state-of-the-art systems are still the feasible option for providing high value at a much lower cost. And nothing still beats the watchful eyes and imaginative capacity of a human post-editor. So if you’re in need of expert machine translation solutions, feel free to reach out. We’re up for the challenge.