Television to the Rescue of Voice Recognition?
I am not a linguist, or a computer programmer. But I do watch a lot of television. Well not actually television, but television shows. Does anyone really spend that much time watching live television anymore? I digress.
What television shows on DVD have in common is- subtitles. In English for the hearing impaired and in other languages for the overseas market. And if you include the entire catalogue of TV shows over the last decade and a half that is a lot of audio with corresponding text in multiple languages including English.
Don’t get it? Let me explain. The main problem with voice recognition technology (yes even with the Jesus phone’s newest toy, Siri) is that it has trouble with dialects, gender, accent, slang and on and on. Language is hard, even harder if you are not human and not Skynet powerful. Apple has been pretty clever using a server based system for Siri, but it is still in beta. Other versions require ‘teaching’ the computer to understand your voice, but if anyone else wants to use it, more teaching. So wouldn’t it be nice if you could fast forward all this training?
And so we come back to television. Television shows feature many many different versions of language use; both genders, a high variety of ages, accents and slang. All transcribed into text.
So you have the all-important audio (in digital format no less) and the corresponding text and you have thousands upon thousands of hours of it. The work has been done already by the script writers (into the original English version) and by an army of translators into German, Arabic, French, Chinese and every other major language on the planet. And what’s more, new content is being created every week with updates to the really hard words for computers- popular culture words.
So with little effort, our Googleplex or Cupertino overlords could train their voice recognition computers to understand fo shizzle without stern words from Siri about speaking more clearly.
Television to the rescue, whodathunkit?