BlackWaspTM
Audio
.NET 3.0+

Speech Recognition

The later versions of Microsoft Windows include a speech recognition engine called "Windows Desktop Speech". This engine is made available to .NET framework developers using the types defined in the System.Speech.Recognition namespace.

System.Speech.Recognition Namespace

Microsoft Windows includes a speech recognition system known as Windows Desktop Speech. This is available in Windows Vista and later releases of Windows, in addition to Windows XP with service pack 3 installed. The speech recognition engine allows sounds to be digitised and words or phrases extracted from the processed data.

The Windows Desktop Speech engine can be used by developers using the .NET framework version 3.0 or later. All of the classes and methods required are provided in the System.Speech.Recognition namespace. This namespace can be used in many ways. For example, you can perform synchronous or asynchronous speech recognition, you can use the provided grammars to allow dictation or you can create custom grammars that are suited to voice control of applications.

This article serves as a basic introduction to speech recognition using C#. We will create a sample program that recognises English phrases and outputs them to the console until you say "Stop".

Creating the Sample

To create the sample program, start by creating a console application. To enable access to the speech recognition engine, add a reference to System.Speech.dll. To simplify the code, add the following using directive to the top of the automatically created class file:

using System.Speech.Recognition;

Initialising the Speech Recognition Engine

The primary class that we will use is SpeechRecognitionEngine. This allows you to create discrete, in-process speech recognition engines for use only by your application. We will instantiate our object in the Main method using the default constructor to use the default recogniser. The class implements IDisposable so we will add a using statement to ensure that it is disposed of correctly.

using (var engine = new SpeechRecognitionEngine())
{
}

When using the speech recognition engine you must supply a grammar. Grammars include rules and constraints that determine the words or phrases that can be recognised. You can create your own grammars when you have a specific problem to solve. In this case we will use a built-in grammar designed for dictation. It is provided by the DictationGrammar class.

We can load a new dictation grammar into the engine using the LoadGrammar method, as shown below. Add this to the code block of the above using statement.

engine.LoadGrammar(new DictationGrammar());

In addition to loading a grammar, we must identify the source of the speech to be recognised. This can be an audio input device such as a microphone, a wave file or another audio stream. We will be using the default audio device, which you can set using the Audio Devices applet in the Control Panel. In most cases this will be a microphone. To use the default device call the SetInputToDefaultAudioDevice method of the recognition engine:

engine.SetInputToDefaultAudioDevice();
27 July 2011