The future of the internet and the translation industry with the rise of large language models

The future of the internet and the translation industry with the rise of large language models

On 1 March 2023, a special “ChatGPT in Localization” online conference was held by Custom.MT. It was the first event of this kind, entirely devoted to the role of an innovative technology utilised by the translation and localisation business.

The conference was kicked off by Marco Trombetti, CEO of Translated, who gave a speech about artificial intelligence (AI), machine translation and the impact of the ChatGPT model on localisation and the future of the language industry.

Man versus Machine

Why translation is so important? The reason is that language is the most human thing to ever exist. Without it we can’t evolve. Translations help us break the communication barriers between people of different languages, and this, in turn – according to Trombetti – will help us advance to the next step on the evolutionary ladder.

According to one view, machine translations are subpar when compared with human translations because machines can’t understand the day-to-day reality of human life or any larger context. To prove that this view is wrong, Trombetti partook in the following experiment:

  • A sentence in English was prepared: “A language translator in the 90s translating on a computer. Oil painting.” Then it was translated into French using machine translation: “Un traducteur de langue dans les années 90 traduisant sur un ordinateur. Peinture à l’huile.”
  • Then, an encoder-decoder model was used that is based on recurrent neural networks present e.g. in natural language processing systems. This model was trained to “enter the mind” of the machine in order to interpret its thought process, and that feature was used to conduct the experiment.
  • A program fed with specific instructions created a system that could convert words to images, which resulted in creation of this art piece expressing the same meaning as the original sentence did:

Trombetti believes this experiment proves that machines understand the meaning of texts translated by them – even more so than we might suppose. It’s just that humans are not yet capable to instruct the machines sufficiently enough to extract relevant information from them. Are the results of this experiment enough to dispel any doubts we might have and lead us to believe that AI is capable of a more in-depth understanding of the world around us? That’s a whole new story for another article.

ChatGPT

What really is ChatGPT? It’s a chatbot created by an American research lab called OpenAI. The way it works is that it predicts the next word in a particular context (at the moment this might be a 1500-words-long text). Just like any other automated solutions, it has its pros and cons:

  • Very good context identification. The software does not translate texts word-by-word or sentence-by-sentence, but “sees” the whole picture (or rather text), so there’s no danger of making mistakes resulting from misunderstanding the context.
  • Very capable of adjusting translation to guidelines provided. This model will prepare a translation in the required style – even a very formal one, if needed.
  • Average quality of translations. When a human editor checks a freshly-made ChatGPT translation, they usually find a lot of mistakes – this chatbot makes more of them than popular tools for machine translation, e.g. Google Translate.
  • Slow translation process. The program needs at least fives seconds to process one word.

Although ChatGPT still struggles with numerous issues (e.g. hallucinations), Trombetti predicts that overcoming them is just a matter of time. One thing is certain, though – you will have quite a few alternative options soon, as brand new LLMs (large language models) are constantly introduced to the AI market, e.g. BLOOM (BigScience), Chinchilla (DeepMind), or LLaMA (Meta).

The future of the translation industry

How can LLMs influence the translation business? People will use these tools to create content directly in the target language, and then employ the services of an experienced copywriter to verify the created text. This will allow us to produce better content at lower costs. Tools like ChatGPT will also aid content providers in preparing completely new content, as well as creating content from scratch on the basis of an already existing publication – the difference here being that the quality will be noticeably higher.

Although some concerns are still being raised in regard to this technology, for many, ChatGPT’s capabilities are nothing short of amazing. Why?

Firstly, they firmly believe that the system will be continuously improved. Even now, it runs really well – though it operates with minimum data resources. Secondly, it warrants highlighting that ChatGPT is not a replacement for other tools, like search engines or translation software. Think of it rather as an exciting alternative.

And all the signals indicate that people want to use it. This might stem from necessity or mere curiosity, but the fact remains that a record-breaking number of 100 million users were registered on ChatGPT’s website barely two months after its launch.

And finally, people are drawn by the idea of an open space surrounding the project, where everybody can play a part – the possibilities are endless, data is free and open-access. The only obstacle is the costs associated with training LLM models, though even this might change rapidly.

What about the internet?

As Trombetti has pointed out, the whole internet is based on all kinds of content that users can interact with, and this, in turn, drives the business. The status quo has been disrupted by LLMs; therefore, there are two options left for content creators: they can either block LLMs from accessing certain publications, or they can create enormous amounts of content that will affect the way the model shares data with its users. Both options have their own limitations – also when it comes to implementation.

So what does the future hold for the internet?

The best solution seems to be achieving a sort of synergy between language models and human-made content. By sharing the sources of their data with users, LLMs will drive site traffic and will be used by people to sell more products and services.

We must act quickly, though – language models might still be a novelty, but that won’t last forever. Soon, they will be replaced be even more advanced technologies and solutions.

Summing up his presentation, Trombetti said: “Translation is the hardest problem in AI, but the main source of inspiration.” Various state-of-the-art solutions are being worked on as of now, with one unifying goal: to help people communicate with one another.

Share this post

Leave a Reply

Your email address will not be published. Required fields are marked *