What is speech recognition software and who uses it?
Speech recognition software, like Dragon, is often sold as a tool for professional users like doctors and lawyers to use for dictation, but it has a secondary advantage of allowing people with physical disabilities to participate in the ever-increasing community of the Internet. Dragon has expanded that participation by releasing upgrades every couple of years to include more advanced abilities for its users. Even corporations and government agencies are procuring the software for their physically disabled employees who cannot otherwise use a computer due to little or no movement in their upper extremities. Dragon is moving quickly into the world of AI, with its increased capabilities of learning how someone speaks, where they put inflections, and even the slang they use regularly.
How does Dragon NaturallySpeaking work?
Through speech input, people who cannot move their arms or hands, can type, cut and paste, or scroll with their own voice patterns. Users may choose the type of microphone that works best for them. Some are comfortable with a standard wired headset, while others prefer the ease and freedom of a Bluetooth headset. There is also a third option of a stationary, desk-mounted mic.
After selecting a microphone, anything can be dictated using Dragon NaturallySpeaking into almost any application. The commands are derived from commands in Word, reciting things like “paragraph,” “capitalize-that,” and “exclamation point”. Users need only to practice speaking text so that the program can learn speech patterns and vocabulary particular to each person. Dragon comes with a basic dictionary, but improves its accuracy with repeated usage. It adds to its dictionary by learning from a user’s phrasing, jargon, and even accent. The more the software is utilized, the less corrections will need to be made. Nuance also offers several specialized Dragon programs for people in specific employment fields, such as medicine and law. These versions have dictionaries with much more precise language commonly used in those fields.
Given that a mouse is a primary tool used to navigate the Internet, Dragon has also developed its own adaptation of one. This is where the MouseGrid command is used. When spoken into the microphone, MouseGrid displays a 3 x 3 grid on the screen that is broken into equal blocks. Each block is numbered 1 through 9. By choosing the numbered block closest to where the user wants to locate the mouse, a new, smaller grid with smaller numbered blocks will appear in that area. The user continues to narrow down the grids, until the cursor will be placed directly over they’d like to “click”.
This may seem very time consuming, however with practice it becomes quite quick. As websites are visited more frequently, the patterns of which numbers in the grids are used to get to certain points become easier to remember. Without the MouseGrid available, a high percentage of the Internet would not be accessible to those who can only use speech recognition as their computer navigation system.
Example of the MouseGrid command being used on a Wikipedia page. Photo from Nuance.com
One of the other important aspects of Dragon is the dictation box. This is used when a field on a webpage cannot be completed using speech recognition because of an input field not being recognized. Typically this would be caused by fields not being labelled appropriately through code. The dictation box appears on command, and provides users a workaround. The user dictates the desired text into the box and will then transfer the text to the field, which is done by pasting. Using the dictation box is not difficult, but does take additional time that can be time consuming, especially when doing multiple tasks. It is preferable to be able to dictate text directly into a field rather than having to paste it with this workaround.
In order to better understand and communicate with someone who uses Dragon regularly, it’s necessary to know a few key words and phrases significant in the software. This list is by no means all-inclusive; however, it will be beneficial in creating a website that is accessible to people with different means of Internet navigation. There are more terms that can be used as users become more proficient in dictating and developing their own styles of writing out loud, but these are some of the basic commands you may see referred to, or hear:
- Wake-up: Command used to activate the microphone, to be ready for dictation.
- Go-to-sleep: Command used to deactivate the microphone, to go to resting mode in wait for activation.
- Dictation box: Small box (similar to the Windows Notepad) used to dictate pieces of text into that cannot be dictated directly into a field on a website, it appears upon command.
- MouseGrid: A 3 x 3 grid that is used to move the cursor the same as a mouse would move around, giving the full functions of a mouse.
- Macros: Shortcuts established to complete fill-ins and aid in navigation.
- Tab: Command used sometimes to move from one field to another on a website, thus allowing the user to skip content to get to specific destinations.
- Click/Double-click: 2 different commands that function as a mouse clicking or double clicking.
- Mouse-click: Serves as a shortcut command for a single click.
- Correct-that: Command to bring up correction box showi