Editing Internet Texts/Machine translation

Introduction

This project is dedicated to those interested in computer sience and translation who would like to learn about the basic assumptions and priniples of machine translation, as well as get aqcuainted with its history and newest inventions. The aim of this project is to present the complicated matters in a simple way so as to be understandable for those who do not possess any knowledge of this field.

What is machine translation?

Machine translation, also referred to as MT or automated translation, is a field of computational linguistics which uses software in order to translate texts from one natural language into another. Because of the globalisation there has been a growing demand for translating larger amounts of text in shorter time and hence the increasing interest in researching the field and improving the software.

Brief history

• 1949 – 65 The term "machine translation" first appeared in in Warren Weaver’s Memorandum on Translation (1949), and the research of the field began in 1951 at MIT with Yehoshua Bar-Hillel being the key figure. A research team from Georgetown University was the first to publicly present its system in 1954. The presentation was promising enough to grant substantial funding for further research in the United States and it gave rise to interest in MT research in other countries such as Japan and Russia. The first MT conference was held in London in 1956. In 1962 Association for Machine Translation and Computational Linguistics was formed in the United States and in 1964 the National Academy of Sciences formed a committee (ALPAC) to study MT.

• 1966 – 95 The prospects for MT were initially very enthusiastic but instead of progress the researchers encountered serious obstacles which they could not immediately overcome. Seeing that struggle ALPAC issued a report which stated that MT cannot equal human translation, therefore funding for MT research should be limited to bare minimum. Despite the financial problems the research continued and the first MT software was put to work by the French Textile Institute to translate abstracts from and into French, English, German and Spanish (1970). In 1971 Brigham Young University started a project to translate Mormon texts by automated translation, and in 1978 Xerox introduced Systran to translate technical manuals. Trados (1984) was one of the first MT companies, and the first commercial MT system for Russian/English/German-Ukrainian was developed at Kharkov State University in 1991.

• 1996 – 2016 In 1996 Systran offered free translation of small texts, and it was followed by numerous online networked services such as AltaVista Babelfish. MT started to be sold as a software for personal computers, mobile phones, as well as it is used in translating websites and electronic mail. The most recent innovation is Google Neural Machine Translation system from 2016.

Types

Rule-Based Machine Translation

It is the first and simplest system. It uses large collections of rules, manually developed over time by human experts mapping structures from the source language to the target language. The human factor in rule-based systems helps deliver fairly good automated translations with predictable results. However, due to significant manual labor, rule-based systems can be quite costly, time consuming to implement and maintain and – as rules are added and updated – these systems have the potential of generating ambiguity and translation degradation over time.

The process

• Word for word translation

• Introducing language-specific rules

Statistical Machine Translation

The statistical model uses algoritms in order to compare all possible translations and chooses the best one based on statistics. Statistical models train on bilingual parallel corporas and while translating they generate numerous probable translations and compare them to the training data to estimate which translation is the most likely one. This process is much qicker and efficent than RBMT, however, if the bilingual data is not sufficient or of bad quality (“data-dilution effect”) the system is not able to procude a proper translation.

The Process

English	Polish
An overwhelming majority of the house (516 votes in favour, 133 against, with 50 abstentions) adopted a resolution officially laying down the European Parliament’s key principles and conditions for its approval of the UK's withdrawal agreement. Any such agreement at the end of UK-EU negotiations will need to win the approval of the European Parliament.	Posłowie wyraźną większością głosów (516 za, 133 przeciw i 50 wstrzymujących) przyjęli rezolucję w sprawie kluczowych zasad i warunków, od których spełnienia zależeć będzie zgoda Parlamentu na porozumienia dotyczące wystąpienia Zjednoczonego Królestwa z Unii Europejskiej. Każda umowa zawarta w wyniku negocjacji z Wielką Brytanią będzie wymagała akceptacji Parlamentu Europejskiego.

• Breaking the sentence into chuncks and translating word-for-word

• Creating sets of possible translations

• Choosing the most probable set

Przeważającą większością Posłowie przyjęli rezolucję.

The sentence, as it is, doesn't sound bad already, however, the training data would probably suggest that it would be more natural to say:

Posłowie wyraźną większością głosów przyjęli rezolucję.

Neural Machine Translation

NMT is a relatively new model. The first to explore it was Google in 2014 and since then they have implemented it in Google Translate. NMT, similarly to Statistical MT learns on available data, however, it uses deep-learning in order to build an artifical neural network.

Jay Marciano compered Statistical Machine Translation to a game of chess in which players operate within a limited universe and make a limited number of moves. They calculate all possible moves to find the best one, just like SMT. When it comes to Neural Machine Translation it could be compared to playing the piano. Even if you make a mistake you can go back, solve the problem, and play the melody correctly. Neural MT systems are also not bound by such strict rules as in chess; they find their own way and find the best choices.

Neural MT is much more effective, however, it takes time for the models to learn. For this reason Google Translate, even with the model already implemented, still produces imperfect results. What differentiaties NMT from other systems is the freedom it has in finding patterns and clues. They are not told what to look for, they do it themselves. Another major difference is its ability to translate directly from one language to another despite not having much training data. The older systems usually used English as a mediating language, but NMT is capable of translating e.g. Polish to Korean.

Exercises

Quiz

Task

Translate the sentence I want to go to the prettiest beach into your native language in all the four translation engines:

Google Translate
Bing
Free Translation
Imtranslator

Compare the translations
Decide which one is the best
Determine the types of mistakes (if there were any)
Think about the possible reasons why the engine might have made mistakes

References

Michael Nielsen (2009). "Introduction to Statistical Machine Translation". Retrieved 2017-06-05.
Lionbridge Marketing (2017). "Neural Machine Translation: How Artificial Intelligence Works When Translating Language". Retrieved 2017-06-05.
Craciunescu, Olivia; Constanza Gerding-Salas, Susan Stringer-O'Keeffe (July 2004). "Machine Translation and Computer-Assisted Translation: a New Way of Translating?". Translation Journal 8 (3). http://accurapid.com/journal/29bias.htm. Retrieved 2017-06-10.
Adam Geitgey (2016). "Machine Learning is Fun Part 5: Language Translation with Deep Learning and the Magic of Sequences". Retrieved 2017-06-01.