Google’s DeepMind AI Roundly Beats Humans in Lip-Reading

Using its AI prowess, Google is prepping a lip-reading software which may be the best one we have ever seen. Made in partnership with the company’s DeepMind division and the University of Oxford, the lip-reading software used five thousand hours’ worth footage from the BBC to train its network and create a neural network.

As a result, the program called Watch, Listen, Attend and Spell (WLAS) can achieve accuracy of 46.8 percent while watching a video, which may not sound by much, but compared to a human expert (with 10 years’ worth experience), who could only get words right 12.4 percent of the time, it is record-shattering.

The program was trained to a dataset of more than a hundred thousand natural sentences using resource material from the BBC. The program them annotated those 200 plus videos and with hours of training it reached this stage. The University of Oxford has previously trained a program called LipNet to achieve 93.4 percent accuracy in tests.

However, those tests happened in more controlled settings with trained volunteers speaking 51 words,. Compare that to this particular instance where the AI had to decipher more than a hundred thousand words from BBC’s political talk shows and all of a sudden WLAS’s feat becomes much more notable.

Remember that DeepMind is also the same AI which defeated world champion Lee Sedol in a game of Go back in March, though as of today the win is not the only crown for DeepMind.



  • close
    >