Windows Speech Recognition

Windows Speech Recognition is a speech recognition application included in Windows Vista and more recently, Windows 7.


Windows Speech Recognition tutorial in Windows Vista

Windows Speech Recognition allows the user to control the computer by giving specific voice commands. The program can also be used for the dictation of text so that the user can control their Vista or Windows 7 computer.

Applications that do not present obvious "commands" can still be controlled by asking the system to overlay numbers on top of interface elements; the number can subsequently be spoken to activate that function. Programs needing mouse clicks in arbitrary locations can also be controlled through speech; when asked to do so, a "mousegrid" of nine zones is displayed, with numbers inside each. The user speaks the number, and another grid of nine zones is placed inside the chosen zone. This continues until the interface element to be clicked is within the chosen zone.

Windows Speech Recognition has a fairly high recognition accuracy and provides a set of commands that assists in dictation.[citation needed] A brief speech-driven tutorial is included to help familiarize a user with speech recognition commands. Training could also be completed to improve the accuracy of speech recognition.

Currently, the application supports several languages, including English (U.S. and British), Spanish, German, French, Japanese and Chinese (traditional and simplified).


In 1993, Microsoft hired Xuedong Huang from Carnegie Mellon University to lead its speech efforts. Microsoft has been involved in research on speech recognition and text to speech. The company's research eventually led to the development of the Speech API (SAPI).

Speech recognition technology has been used in some of Microsoft's products, including Microsoft Dictation (a research prototype that ran on Windows 9x). It was also included in Office XP, Office 2003, Microsoft Plus! for Windows XP, Windows XP Tablet PC Edition, and Windows Mobile (as Microsoft Voice Command). However, prior to Windows Vista, speech recognition was not mainstream. In response, Windows Speech Recognition was bundled with Windows Vista and released in 2006, making the operating system the first mainstream version of Microsoft Windows to offer fully integrated support for speech recognition.

The use of Windows Speech Recognition during a demonstration of Windows Vista at a Microsoft Financial Analyst Meeting on July 27, 2006, resulted in a well-publicized and embarrassing incident. The software failed to function correctly initially, resulting in an unintended output of "Dear aunt, let's set so double the killer delete select all". A developer with Vista's speech recognition team later explained that Windows Speech Recognition's failure to function properly during the demonstration was the result of a bug in the volume control feature, which caused the application to pick up extra noise that affected its performance. The software bug was fixed by Microsoft prior to the release of Vista to the general public.

Windows Speech Recognition relies on Microsoft SAPI version 5.3 (included in Windows Vista) to function. The application also utilizes Microsoft Speech Recognizer 8.0 for Windows as its speech profile engine.

In 2007, reports surfaced that Windows Speech Recognition could be used to remotely access and/or control a user's computer. Theoretically, playing a pre-recorded message containing Windows Speech Recognition commands could allow one to execute tasks on another computer remotely. This issue was one of the first Vista vulnerabilities to surface after the release of the operating system to the general public.

Microsoft has officially recognized the vulnerability, but estimates it does not present a serious threat, because even if a hacker does successfully exploit the flaw, they would not be able to perform any functions limited by the access rights of the user, which likely include administrative tasks. Also, in Windows 7, this concern is addressed by a user configurable option to enable or disable voice activation of speech recognition.