One of the most important ongoing tech projects of this century is the creation and refinement of natural language processing software.
Now Google has applied its Google Neural Machine Translation (GNMT) system to the translation software behind Google translate, which can now analyze full phrases at a time, rather than a word by word understanding.
This technique allows for increased accuracy in translation, and as researchers recently found, machine learning software allows the algorithm to produce blind translations, meaning translating between two languages it hasn’t been taught before.
For example, the team behind the software taught it to translate from Portuguese to English and from English to Spanish.
The AI was then able to extrapolate from the input data and translate from Portuguese to Spanish.
The research paper published on the phenomenon says, “To our knowledge, this is the first demonstration of true multilingual zero-shot translation.”
But even more surprising than this latest development in AI’s translation abilities, is that it seems the software has created its own language in order to do so.
“Visual interpretation of the results shows that these models learn a form of interlingua representation for the multilingual model between all involved language pairs,” says the paper.
‘Interlingua’ is a language created by artificial intelligence in order to fulfill a purpose. In this case, it was to fill in the gaps between the input it was given in order to fulfill the task of zero-shot translation.
The research team wrote a blog post alongside the publication of the paper which said, “Using a 3-dimensional representation of internal network data, we were able to take a peek into the system as it translated a set of sentences between all possible pairs of the Japanese, Korean, and English languages.”
What this means is that the AI was encoding semantic information present in the languages it was taught in order to figure out how to translate between other languages.
This is a significant development from word to word or even phrase to phrase translation that used to characterize computer translation software.
In the past, when internet users would attempt to translate larger bodies of text from one language to another, it was apparent that the software lacked the ability to interpret semantic input when it suggested words that made little sense in the greater context of the sentence.
Prior to that, even basic syntactical errors were common in translations. The AI’s development and subsequent use of its own interlingua will continue to advance the abilities of computer’s to process and produce language to near human abilities.