A new paper from Microsoft Research claims that the company’s speech recognition and transcription technology is now more accurate than professional transcriptionists.
The error rate of professional transcriptionists is 5.9% for the Switchboard portion of the data, in which newly-acquainted pairs of people discuss an assigned topic.
The error rate increases to 11.3% for the CallHome portion, where friends and family members have open-ended conversations.
“In both cases, our automated system edges past the human benchmark,” the research states.
This marks the first time that human parity has been reported for conversational speech.
“The key to our system’s performance is the systematic use of convolutional and LSTM neural networks, combined with a novel spatial smoothing method and lattice-free MMI acoustic training.”