01/02/2023

The Top 5 Types of Machine Translation Errors to Watch Out For

The Top 5 Types of Machine Translation Errors to Watch Out For 

Machine translation (MT) has become such an important tool for simplifying the translation process that it has forever changed the translation industry. From getting quick translations of foreign restaurant menus to translating a whole e-commerce website in 50 different languages, machine translation has definitely proved itself very useful in crossing language barriers that may have been extra challenging and time consuming without it. But this technology is still far from perfect. 

While popular machine translation like Google Translate can make decent  word-per-word translations, there’s still a lot to be desired when translating larger texts. Even with the advent of neural machine translation, the latest iteration of MT, there’s still machine-produced errors that require post-editing to ensure accuracy.  

In this article, the first of a three-part series, we’ll be focusing on identifying the most common machine translation errors and understanding the factors that contribute to them. Not only will this lead to faster-post editing, but it will also help improve the overall quality of your translations. 

To get a better context of these machine translation errors, let’s take a look at how particular industries may encounter them, specifically in the media.

 

The Significance of Machine Translation in the Media

Machine translation has become very important in the news industry since it allows media to quickly and accurately translate articles into multiple languages, which makes news content accessible to larger audiences. With the inevitable presence of  social media and the Internet, the demand for real-time, multilingual news coverage has greatly increased, and machine translation has become so vital to meet this demand. By using machine translation, news organizations can reach a larger audience, increase their readership, and make important stories heard by more people around the globe.

Machine translation is rapidly gaining popularity in the translation industry, but it’s not always smooth sailing. It can be especially challenging in more nuanced industries like the media. The news industry is not the most straightforward place to put MT into action. Translating news and stories demands precision, as even a tiny mistranslation can mislead readers. It’s not just about translating words, it’s about really understanding the story and conveying it in another language so it can be understood the same way by new readers. The integrity and accuracy of news articles is crucial and can’t be overlooked. That’s why media creation requires a careful touch and a deep understanding of the subject matter.

Let’s take a look at sections of Arabic and Malaysian news text and translations made by Google translate to identify the errors the machine produces. All of the examples in this article are taken from the article Machine Translation: The Case of Arabic-English Translation of News Texts by Noureldin Mohamed Abdelaal & Abdulkhaliq Alaazawie and Error Analysis in Translation of Quotations in Online News Feature by Anis Shahirah Abdul Sukur & Rokiah Awang.

 

Types of Machine Translation Errors

 

Example 1

Source text: قال الصحفي مصطفى بكري ، عضو مجلس النواب ، إن فريق أحمد شفيق يسارع للإعلان عن ترشحه لرئاسة الجمهورية ، حيث كان مقررا له من فرنسا يوم 22 ديسمبر الجاري.

Translated text: Journalist Mostafa Bakri, a member of the House of Representatives, said that the team Ahmed Shafiq is rushing to announce his candidacy for the presidency, where he was scheduled to announce it from France on 22 December.

 

1. Morphological Text

  • These errors happen when the morphological/structure of grammar is flawed or misinterpreted. Morphology refers to the structure of words and how they are formed and congregated.

  • Example 1 shows a morphological error because the source text verb “تعجل” is a past tense, but when translated, it becomes a present progressive; “is rushing.”

 

Example 2: 

Source text: وتجرى االنتخابات الرئاسية دون وجود لمرشحين معارضين أو منافسين بارزين؛ جراء انسحابات سابقة من السباق الرئاسي متعلقة بالمشهد السياسي، ال سيما لكل من المرشحين المحتملين، خالد علي، ومحمد أنور السادات، والفريق المتقاعد، أحمد شفيق

Translated text: Presidential elections are held without the presence of prominent opposition candidates or competitors; previous withdrawals from the presidential race are related to the political scene, particularly to potential candidates Khaled Ali, Mohamed Anwar Sadat and retired team member Ahmed Shafiq.

 

Example 3 

Source text: "Gembira kerana di sini dapat pitih (wang) lebih. Anak-anak pun dah pergi negara lain belajar," katanya. 

Translated text: “I am happy staying here, can get the money and my children can go abroad for further education," she said with a smile.

 

2. Semantic Errors

  • Semantic error is another way of saying “logic error,” where the wrong translation is written.

  • This example showcases a semantic error because the source word الفريق, which means “major general” corresponding to the military rank, is translated into “team” instead. This happened because the source word الفريق is a homonym of another Arabic word that also means “team.” Homonymy causes a lot of mix ups which results in semantic errors.

  • Another semantic error happens in the same example where the ST word جراء, which means “as a result of,” is completely removed in the translated text. This consequently affected the whole meaning of the text.

  • The source text ensues that the candidate’s withdrawal from the elections caused the lack of competitors and candidates in the elections.

 

Example 3 also shows another example of a syntax error. This text came from a  direct quotation in a news feature when the speaker's exact words are written and placed in quotation marks (Stovall, 2005). The story is set in Kampung Bujang, Thailand, where the speaker, Tania Kelian, uses informal language. In the Malay version, the speaker uses the word “pitih” which means money in their hometown. The translated text lost the location context by translating “pitih” to “money.” This choice loses the background information, as “money” is a generic term that can refer to various currencies. To better improve its meaning, the post-editor should either use “pitih” with an explanation (local word for money) or add “(money)” beside “pitih.”

 

Example 4 

Source text: ووفق قرار الهيئة الوطنية لالنتخابات بمصر )رسمية(، تنتهي مساء األحد، عملية االقتراع في اليوم الثالث واألخير لتصويت المصريين بالخارج في التاسعة مساء بتوقيت كل دولة.

Translated text: According to the decision of the National Electoral Commission in Egypt (Sunday), the voting process ends on the third and final day of voting for the Egyptians abroad at 9:00 pm.

 

Example 5

Source text: “Ibu bapa juga berpeluang melawat anak-anak mereka lebih kerap, kerana untuk sampai ke Medan hanya mengambil masa lebih kurang sejam setengah (penerbangan),” katanya. 

Translated text: “Parents too will be able to visit their children more often, as it is only one hour by flight to Medan,” he said.

 

3. Lexical Errors

  • These errors happen when the order of words doesn’t follow the prescribed pattern, like when some words are added or taken away.

  • Example 4 showcases this as the translated text left some parts of the source text out. “ل بتوقيت دولة” which means "local time of each country” was left out in the translated text. 

  • Because of this reduction, the meaning changed. It sounds like the translated version meant that the voting process will end for the Egyptians who are abroad at 9pm, but what about other time zones? This lack of information creates confusion for readers of the translated text.

  • In Example 5, the translated text completely fails to represent the source text as it gets the time of the flight wrong. The source text indicates the time of “sejam setenga” (an hour and a half) while the translated text shows the time of one hour, omitting “setenga” (half) from the sentence . This mistranslation is very serious as it mistranslates a very important information that will mislead any reader.

 

Example 6 

Source text: بدأ، مساء األحد، إغالق صناديق االقتراع في االنتخابات الرئاسية المصرية بالخارج، التي يتنافس فيها مرشحان أحدهما الرئيس الحالي عبد الفتاح السيسي.

Translated text: On Sunday evening, the polls closed in the Egyptian presidential elections abroad, in which two candidates are competing, including current President Abdel Fattah al-Sisi.

 

4. Syntax Errors

  • This is an error in the way words and phrases are put together to form sentences that particularly don’t make any sense. Syntax is essentially the arrangement of words to create well-formed sentences in a language.

  • In Example 6, the translated text is pretty accurate, but we can see that the syntax is distorted. From the translated text, it seems that it is the polls that are closed abroad rather than the Egyptian elections.

  • The meaning of the translation would be clearer if they separated the two sentences that speak of two different events happening in different parts of the world, though clearly interrelated.

  • The whole sentence would be better if they used the preposition “for” as in “... the polls for the Egyptian Presidential Elections.”

 

Example 7

Error

Machine Corrected

pelrbagai 

pelbagai

countand

Country and

kereta apa

keret

kajayaan

kejayaan

Lady williams

Lady Williams

“me and my family…”

“Me and my family”

“Our target is… holidays”

“Our target is … holidays.”

Tan berkata… lemak dan garam

Tan berkata… lemak dan garam.

daripafa

daripada

kereta apa

kereta api

news paper

newspaper

 

5. Orthographic Errors

  • These errors occur when the machine unauthorizedly corrects a misspelled word from the source text. Included in this type of error are mishaps in punctuation and capitalization.

  • The example above showcases words from the source text with wrong spellings & punctuations and the machine translated text of it.


From the examples above, we can see that Google Translate is far from perfect. Even if it uses the neural machine translation system, which according to Castilho, Gaspari, Moorkens, and Way (2017), produces much better translation outputs compared to statistical machine translation systems, it still leaves a huge room for humans to post-edit.

 

Why is it important to understand what the most common types of machine translation errors are?

Here are a few reasons why it’s important to identify machine translation errors:

It improves translation quality. By knowing and understanding the common errors that MT models make, translators can more swiftly and effectively address these mistakes during the post-editing process, leading to much better translations.

It saves time. Quickly recognizing and fixing common errors during post-editing can save a lot of time and increase efficiency during the translation process.

It leads to continuous improvement. Identifying and tracking the types of errors made by MT systems can help these systems be continuously improved through the resources it gathers from our inputs.

It promotes better decision making. Knowing the common errors can help translators make informed decisions about when to use MT and when to rely on human translation.

It ensures consistent quality. By being aware of common errors, translators can ensure that the final translation is consistently good.

To wrap up, recognizing the different kinds of machine translation errors is crucial to ensure the accuracy and quality of the final output as well as improve the efficiency of the translation process. The topic of machine translation errors and the importance of their identification is a complex and multifaceted one.

To further explore the subject, stay tuned for our upcoming articles from this series about Machine Translation Errors and Machine Translation Post-Editing (MTPE). These articles will dive deep into the various aspects of MTPE and provide insights and best practices to guarantee the quality of machine-translated texts. Whether you're a professional translator, language service provider, or just interested in the latest developments in the field of machine translation, these articles are a must-read.