Voice recognition, also sometimes called speech recognition, automatic speech recognition (ASR), or speech-to-text, is a method of recognizing spoken language by computer and turning it into written text. Once the computer has recognized the text it can paste it into a document (called dictation) or recognize it as a command (called voice control).
Research into speech recognition technology originally began at Bell Laboratories in the 1950s as Bell was hoping to eliminate the cumbersome method of dialing rotary telephones. The company hoped to create a system allowing telephone users to speak the digits they wished to dial into the handset and have the computer dial the numbers for them. However, computers of the time weren’t quite up to the task: for example, the Audrey System developed by Bell could only recognize digits, not words or phrases. Further, it could only understand one voice at a time, with each new person needing to re-train the machine to understand their voice.
While companies like IBM, Carnegie Mellon University, and the U.S Department of Defence would continue research into voice recognition throughout the 60s and 70s (with a special interest in automatically transcribing wiretaps and other espionage recordings), it wouldn’t be until the 1990s that computer hardware would become powerful enough to make voice recognition practical for users with disabilities.
Early systems, largely developed for DOS computers, could only understand several thousand words – for context, there are about 171,000 words in the English language – and were unable to recognize the boundaries between words. In order to use early voice recognition tools, users would have to train the tool on each word, requiring a long pause between every single word, to allow the program to recognize each word individually.
This cumbersome training process wouldn’t change until 1997, when more powerful computers and advances in AI, would allow the first release of Dragon Naturally Speaking, a program that could recognize the boundaries between words itself, thus allowing users to speak at a more normal pace and intonation.
Today, voice recognition systems developed by companies like Google, Microsoft, Apple, and others can achieve well over 90 percent speech recognition accuracy, can recognize millions of words in multiple languages, and can even automatically insert punctuation marks based on the intonation and length of pauses of the user.
Voice recognition has also expanded far beyond just users with disabilities: Google Home, Alexa, and Siri are examples of how an alternative method of input can make products that are more customizable, flexible, and easier to use for everyone.