Microsoft Invents AI to Mimic Speech with 20 Minutes of Training

Text to speech has been revolutionized in the past few years. Speaking new languages and interacting with people from different linguistic backgrounds has become a lot easier, mainly after applications like Google translatotron.

The issue that arises with text to speech applications is training the AI. Even after millions of samples, the algorithm is usually not trained properly. However, Microsoft has achieved a breakthrough in the case.

Its collaboration with Chinese researchers has paid off since they have crafted a text to speech AI that can make realistic conversions with a slight robotic factor only after 20 minutes worth of training samples.

How it Works

This text to speech AI is based on Transformers (deep neural networks) and agent-based modeling. The agent-based model uses a neural network as its underlying technology. These networks replicate human neurons and emulate the brain’s working, by processing every input and output simultaneously. This allows the processing of lengthy sequences efficiently.

When combined with a noise removal algorithm, the result is a very natural sounding speech. The technology is not sophisticated but works pretty well for most AI applications.


Even though the results are not 100 percent efficient, they are better than what we had in the past. The AI provides a word intelligibility rate of 99.84 percent. This AI is a real breakthrough in the text to speech technology and can result in the more common availability of text to speech AI.

Just a little research can go a long way in training the algorithm. This new text to speech AI can not only help the speech impaired it will also prove as a good base for future research.

  • close