Microsoft’s conversational speech recognition system has reached an error rate of 5.1% surpassing last year’s error rate of 5.9%. The voice recognition system aims to read the words in a conversation like humans do.
The Significance of the Achievement
The new milestone means that for the first time computers will be able to achieve the same word recognition accuracy as humans.
“Our research team reached that 5.1% error rate with our speech recognition system, a new industry milestone, substantially surpassing the accuracy we achieved last year,” Microsoft said in a blog post on Sunday.
In October 2016, Microsoft announced that its Artificial Intelligence and Research team had come up with a speech recognition system that makes the fewer (or at least the same number of) mistakes when compared to a professional transcriptionist. The error rate reported by the researchers back then was 5.9%.
“Last year, Microsoft’s speech and dialogue research group announced a milestone in reaching human parity on the ‘Switchboard’ conversational speech recognition task, meaning we had created technology that recognized words in a conversation as well as professional human transcribers,” said Xuedong Huang, Technical Fellow, Microsoft.
‘Switchboard’ is a corpus of telephone conversations used by the speech research community for more than 20 years now. Its task is to transcribe conversations between strangers on topics such as sports in order to benchmark a speech recognition system.
In order to explore model architectures, the team used “Microsoft Cognitive Toolkit 2.1” (CNTK), which is a versatile and deep learning software. Microsoft’s investment in cloud computing infrastructure improved its adequacy and speed.
The software giant has aimed to reach human parity since the past 25 years.
“Moving from recognizing to understanding speech is the next major frontier for speech technology,” the post read.