10/01/2023

Spotlight on source quality improvement with Intento’s Konstantin Savenkov

Spotlight on source quality improvement with Intento’s Konstantin Savenkov

When we think about transcreation AI is usually the farthest thing from our mind. After all, despite the many wonderful opportunities AI has opened up, human creativity and imagination remains outside the range of its operations.

But we must remember that AI is not meant to replace such things, and should instead aim to serve as tools in their service. That’s how new and innovative uses for the technology arise.

Previously, we touched upon an intriguing topic that SlatorPod brought up with Intento CEO Konstantin Savenkov—source text improvement. It’s a way of automating the process of improving a source text prior to translation in order to improve output. And in a bold statement, Konstantin claims it can help break bottlenecks in the transcreation workflow.

Konstantin describes it like this—when a sculptor makes a statue from a rock, it’s much easier to make one from a big, solid rock rather than another statue. In short, having an “internationalized” or generalized block of text is easier for translation and transcreation professionals to work with and polish into a prime piece of language.

There’s a lot to discuss about the topic and what’s clear now is that this process could have great benefits down the localization workflow.

But time and trends move fast, and source text improvement is now being referred to as source quality improvement. This is definitely a more apt descriptor for what it means to do, so we’ll be calling it as such throughout this wonderful and highly-detailed follow-up interview with none other than Konstantin himself.

Tell us a bit about source quality improvement. What is it exactly, and what made you realize it was an important thing to look into?

Source quality improvement is the process of optimizing the source text before it is translated to produce a better translation. It can involve various tasks, such as simplifying the language, clarifying ambiguous or confusing passages, or ensuring that the text is properly formatted and follows style guidelines.

The importance of source quality improvement becomes apparent when you consider the fact that the quality of the translation is directly related to the quality of the source text. If the source text is poorly written, ambiguous, or difficult to understand, it will be more challenging for the translator to produce an accurate and fluent translation. On the other hand, if the source text is clear, concise, and well-written, it will be easier for the translator to produce a high-quality translation.

Source quality improvement can also be important for preserving the intended meaning and tone of the original text. For example, if the source text contains idioms, colloquialisms, or culturally specific references, it may be difficult for a translator to accurately convey these elements in the translation. By optimizing the source text before it is translated, it becomes easier to preserve the intended meaning and tone of the original text.

Overall, source quality improvement is a key component of the translation process, as it helps to ensure that the translated text is accurate, fluent, and faithful to the original.

So the basic principle behind it isn’t new—good MT input leads to better MT output, after all. But what’s new is the use of AI / NLP tools like GPT-3 to achieve results. Can you give us your thoughts on that?

You're correct that the basic principle behind source quality improvement is not new—the idea that good input leads to better output has been a fundamental tenet of translation for a long time. However, the use of AI and NLP tools like GPT-3 is undoubtedly a new development in this field and one that has the potential to revolutionize the way that translations are produced.

One of the key advantages of using AI tools for source quality improvement is that they can quickly analyze the source text at scale and accurately identify areas that may need improvement. For example, an NLP tool can identify ambiguous language, repetitive phrases, or unnecessarily complex sentence structure and suggest ways to simplify or clarify the text. It can save human translators significant time and effort, as they no longer need to review the source text for these issues manually.

Another advantage of using AI for source quality improvement is that it can learn from the expert feedback of data and improve over time, becoming more accurate and effective as they process more text.

Overall, using AI and NLP tools for source quality improvement can significantly improve the efficiency and accuracy of the translation process and will likely become an increasingly important part of the industry in the coming years.

What kind of things would source quality improvement entail? How do you “improve” a source text?

There are many things that source quality improvement can entail, depending on the text's specific needs and the translation's goals. Some common tasks that might be involved in source quality improvement include:

Simplifying the language: If the source text is written in complex or obscure language, it may be difficult for a translator to accurately convey the intended meaning. In these cases, it can be helpful to simplify the language to make the text easier to understand.

Clarifying ambiguous or confusing passages: If the source text contains ambiguous or confusing passages, it may be necessary to rephrase or restructure the text to clarify the meaning.

Ensuring that the text follows style guidelines: Different industries, organizations, and cultures have different conventions for writing and formatting. To produce a translation that is appropriate for the intended audience, it may be necessary to ensure that the source text follows the relevant style guidelines.

Removing unnecessary information: If the source text contains extraneous information or tangents that are not relevant to the main message of the text, it may be helpful to remove this information in order to streamline the text and make it easier to translate.

Adding context or background information: In some cases, the source text may be missing important context or background information necessary for the translator to understand the text's intended meaning. In these cases, it may be helpful to add this information to make the text more translatable.

After the podcast episode, opinion seems to be divided about whether source quality improvement really is good for the transcreation process. Why do you think people might be for or against it?

It's common for people to have different opinions on the value of source quality improvement in the transcreation process. In general, I think it's the same reservations about letting the machine do something only humans can do, especially if it's perceived as an art. We've seen it towards MT in the past. Now that MT is common and widely adopted, the same reservations apply to the new kid on the block.

There are several specific factors that could contribute to these differing opinions, including:

Personal experience: Some people may have negative experiences with human translators editing the source text. They are likely to have much more reservations about letting machines do that.

Understanding of the role of source quality improvement: Some people may not fully understand the role that source quality improvement plays in the transcreation process and therefore may not see its value.

Beliefs about the role of the translator: Some people may believe that the translator's role is to work with whatever source text is provided and that it is not the translator's job to "improve" the source text. Even if it's a machine translator :-)

Beliefs about the limitations of source quality improvement: Some people may believe that source quality improvement has limitations and that it is impossible to make the STI a standalone step before translation.

Overall, it is crucial to keep in mind that source quality improvement is just one part of the transcreation process and that it is not a one-size-fits-all solution. Different texts and translation goals may require different approaches, and it is important to consider the specific needs of each text to determine the most appropriate course of action.

What argument would you give to those who are skeptical about source quality improvement to convince them about its benefits?

There are a number of arguments that could be made to convince skeptics about the benefits of source quality improvement:

It can lead to more accurate translations: By optimizing the source text before it is translated, it becomes easier for the translator to accurately convey the text's intended meaning. This can help to ensure that the translated text is faithful to the original and does not contain any misunderstandings or inaccuracies.

It can save time and resources: By identifying and addressing potential issues with the source text before it is translated, source quality improvement can help to save time and resources that would otherwise be spent on revising or correcting the translation.

It can improve the overall quality of the translation: By optimizing the source text, it becomes easier to produce a translation that is accurate, fluent, and reads naturally. This way, the translation will be more effective for its intended audience.

It can help to preserve the intended tone of the original text: By optimizing the source text, it becomes easier to preserve the intended tone of the original text, which can be especially important when translating texts that contain idioms, colloquialisms, or culturally specific references.

Overall, I believe that source quality improvement has the potential to improve the efficiency and accuracy of the translation process greatly, and can help to produce high-quality translations that accurately convey the intended meaning of the original text.

Do you have any plans or projects already underway that are related to developing source quality improvement?

They are more than underway. About three years ago, Intento customers asked us how to use MT in real-time translation for customer and employee experience. One of the first hurdles we faced while building such solutions was making the text more translatable and adding missing context to avoid MT exposing its biases, which are so harmful to such use cases.

Since then, we've built a range of STI tools enabled in the automatic translation workflows in Intento Enterprise MT Hub. They were based on rules, heuristics, and regular expressions in the early days. Then we switched to using pre-trained multilingual models. Very recently, this year, we have onboarded the next wave of technology and found that with general pre-trained models, such as GPT-3, the source quality improvement can handle some of the most advanced challenges, such as transcreation for marketing content.

Today, we use these tools in production to help our customers translate billions of words.