Google Translatotron Can Translate Languages in The Speaker's Voice

Lately, Google has been focusing on speech-related products. A few weeks back, Google announced that it was training its AI to help with speech impairment. Last year, Google announced the interpreter mode in its assistant, in the same time period more accents and languages were added to Google translate app.

The most recent release is Google translatotron, a speech translation model that can help people directly convert their speech from one language into another, that also, in their own voice.

Note that this isn’t available as an app or a feature in Google translate, and is currently being tested by Google in-house.

Google Translatotron

It’s a first of its kind translation model and will make speaking in other languages easier. It can help convert the user’s speech to another language keeping the user’s voice. Usually, translation apps convert the speech to text, which is translated and converted to speech again.

Even though it is a widely used method, it can lead to a number of errors during the translation process. Which is why this end-to-end translation will open up many future developments.

How Does it Work?

According to Google, the Translatotron algorithm is based on a sequence-to-sequence network model. A sequence-to-sequence model takes an input sequence and produces an output in the form of a sequence. In this case, the sequence is a visual representation of the voice in the form of a spectrogram. The algorithm takes the spectrogram and generates a target spectrogram in the desired language.

This reduces the risk of losing data on the way and is also a faster process as compared to the speech-to-text-to-speech method. The voice generated is slightly robotic but since an application is still under development we have high hopes for the future.

For maintaining the speaker’s voice, an optional speaker component support is also added to the model. The samples of speech translations are available on GitHub.

Via Google AI Blog