.NET 3.0+

Speech Recognition

by Richard Carr, published at http://www.blackwasp.co.uk/SpeechRecognition.aspx

The later versions of Microsoft Windows include a speech recognition engine called "Windows Desktop Speech". This engine is made available to .NET framework developers using the types defined in the System.Speech.Recognition namespace.

Download Source Code

System.Speech.Recognition Namespace

Microsoft Windows includes a speech recognition system known as Windows Desktop Speech. This is available in Windows Vista and later releases of Windows, in addition to Windows XP with service pack 3 installed. The speech recognition engine allows sounds to be digitised and words or phrases extracted from the processed data.

The Windows Desktop Speech engine can be used by developers using the .NET framework version 3.0 or later. All of the classes and methods required are provided in the System.Speech.Recognition namespace. This namespace can be used in many ways. For example, you can perform synchronous or asynchronous speech recognition, you can use the provided grammars to allow dictation or you can create custom grammars that are suited to voice control of applications.

This article serves as a basic introduction to speech recognition using C#. We will create a sample program that recognises English phrases and outputs them to the console until you say "Stop".

Creating the Sample

To create the sample program, start by creating a console application. To enable access to the speech recognition engine, add a reference to System.Speech.dll. To simplify the code, add the following using directive to the top of the automatically created class file:

using System.Speech.Recognition;

Initialising the Speech Recognition Engine

The primary class that we will use is SpeechRecognitionEngine. This allows you to create discrete, in-process speech recognition engines for use only by your application. We will instantiate our object in the Main method using the default constructor to use the default recogniser. The class implements IDisposable so we will add a using statement to ensure that it is disposed of correctly.

using (var engine = new SpeechRecognitionEngine())
{
}

When using the speech recognition engine you must supply a grammar. Grammars include rules and constraints that determine the words or phrases that can be recognised. You can create your own grammars when you have a specific problem to solve. In this case we will use a built-in grammar designed for dictation. It is provided by the DictationGrammar class.

We can load a new dictation grammar into the engine using the LoadGrammar method, as shown below. Add this to the code block of the above using statement.

engine.LoadGrammar(new DictationGrammar());

In addition to loading a grammar, we must identify the source of the speech to be recognised. This can be an audio input device such as a microphone, a wave file or another audio stream. We will be using the default audio device, which you can set using the Audio Devices applet in the Control Panel. In most cases this will be a microphone. To use the default device call the SetInputToDefaultAudioDevice method of the recognition engine:

engine.SetInputToDefaultAudioDevice();

Performing the Speech Recognition

With the engine configured we can begin the speech recognition. First let's create a variable that will hold the results of the recognition operations. This is a RecognitionResult object. We'll be using it repeatedly in a while loop so to begin we just need to create it and set its value to null.

Add the following as the next line within the using statement's code block:

RecognitionResult result = null;

The RecognitionResult object will store various details about the recognition operation, including the entire phrase in the Text property and a read-only collection of the individual words in the Words property. If no words are recognised, the result will be null so the loop will continue whilst the result is null and the Text property does not contain the word, "stop".

Add the loop within the using statement's code block as follows:

do
{
} while (result == null || result.Text.ToLower() != "stop");

Finally, we can perform a recognition operation and output the result to the console if it is not null. There are several ways to recognise speech, including synchronous and asynchronous options. For simplicity, we will use the synchronous option provided by the Recognize method.

Add the following code within the loop:

result = engine.Recognize();
if (result != null) Console.WriteLine(result.Text);

Testing the Program

You can now run the program to try the speech recognition facilities provided by .NET and Windows Desktop Speech. The accuracy of recognition can vary. For the best results, ensure that you are using a good quality microphone and that there is as little background noise as possible. When attempting to recognise phrases, rather than single words, speaking naturally yields poor results. Instead, speak in a staccato fashion, leaving a brief pause between each word.

27 July 2011