Science

This article is more than 1 year old

Want a medal? Microsoft 7.2% less bad at speech recognition than IBM

Clash of the machine-learning titans

Tue 22 Aug 2017 // 08:02 UTC

In a machine learning tug-of-war, Microsoft may have just barely slipped ahead of IBM for speech transcription accuracy.

Researchers are studying how to recognise human speech in a variety of settings – from realtime interactions to offline, pre-recorded voicemails. Boffins tell us that one application, particularly of offline transcription, could be government surveillance.

In March, IBM researchers claimed that they had achieved a word recognition error rate of 5.5 per cent for pre-recorded English telephone conversations between strangers on set topics such as sports. They're presenting their peer-reviewed research this week (PDF) at the INTERSPEECH 2017 conference in Stockholm, Sweden.

On Sunday, Microsoft published a blog post and technical whitepaper claiming it has achieved 5.1 per cent on the same task – a small improvement.

Like the IBM work, its algorithms used deep learning architectures for acoustic and language modelling. Microsoft claims it had achieved a word error rate of 5.9 per cent last year and credits its bump to "using the most scalable deep learning software available, Microsoft Cognitive Toolkit 2.1 (CNTK), for exploring model architectures and optimizing the hyper-parameters of our models. Additionally, Microsoft's investment in cloud compute infrastructure, specifically Azure GPUs, helped to improve the effectiveness and speed by which we could train our models and test new ideas."

Eric Postma, a computer scientist at Tilburg University in the Netherlands who studies speech recognition, told The Register it is "a significant step forward" but "not a breakthrough" because the goal is to achieve human-level recognition – like being able to comprehend utterances with multiple voices speaking simultaneously in a cocktail party or when you need common sense.

Microsoft admitted there's still tons of work to be done on recognising various accents, speaking styles and languages – not to mention comprehending conversations in crowded rooms with a distant mic.

And although IBM may claim that a 5.1 per cent error rate on this dataset would be human-level recognition, Postma said: "That's marketing, not science."

Phil Woodland, an information engineer at Cambridge uni who specialises in speech recognition and has worked on the same dataset before, told The Reg that "the error rates have come down significantly" since this problem was tackled in the early 1990s (using one 2004 telephone conversation dataset called RT-04 IBM researchers achieved an error rate of 15.2 per cent).

He pointed out that in addition to recognising speech between strangers, IBM's new paper also transcribed a dataset for speech between family members, who would speak casually (achieving an error rate of 10.3 per cent). By comparison, Microsoft's paper only tackled the "easier" problem – when strangers speak their voice is more formal and easier to understand.

He says it's difficult to "pin down" a metric for human performance since it can vary from task to task. There's a chance the Microsoft algorithms might actually perform worse on the harder dataset or get similar numbers to IBM, he said.

It's also unclear if the Microsoft algorithms could apply to other datasets. It's possible that the researchers' algorithms might be tuned to work specifically on telephone conversations, and would not transfer to tasks such as voice search or transcribing broadcast data from media archives. ®

Topics

Special Features

Vendor Voice

Resources

Science

Want a medal? Microsoft 7.2% less bad at speech recognition than IBM

Clash of the machine-learning titans

More about

More about

Narrower topics

Broader topics

More about

More about

More about

Narrower topics

Broader topics

TIP US OFF

Other stories you might like

Microsoft slammed for lax security that led to China's cyber-raid on Exchange Online

US government excoriates Microsoft for 'avoidable errors' but keeps paying for its products

Microsoft breach allowed Russian spies to steal emails from US government

Reducing the cloud security overhead

Microsoft unbundling Teams is to appease regulators, not give customers a better deal

Microsoft squashes SmartScreen security bypass bug exploited in the wild

Microsoft puts ex-DeepMind boffin in charge of London AI hub

Want to keep Windows 10 secure? This is how much Microsoft will charge you

Microsoft thinks bundles are great and customers love them

Tech titans assemble to decide which jobs AI should cut first

French lawmakers take a swing at cloud monopolies

Microsoft, OpenAI may be dreaming of $100B 5GW AI 'Stargate' supercomputer

About Us

Our Websites

Your Privacy