BlackWasp
.NET Framework
.NET 1.1+

Character Testing with the Char Structure

When developing software that may be deployed in many countries, it is important to be able to determine the type of a character according to the rules of the local culture and language. The Char structure provides methods to simplify this process.

Char Structure

The Char structure is used to represent a Unicode character, often referring to a single character from a string. Although it is a very simple type, it does include several methods that are very useful for testing the type of character that it holds. These static methods, each with the prefix "Is", can be used to detect whether a character is a letter, digit, symbol or element of white space. The methods understand the character categories for Unicode character sets, making them especially useful for software that will be deployed internationally.

Checking for Numeric and Alphanumeric Characters

In this article we will use some of the common methods of the Char structure to test characters. The results will vary according to the configuration of your operating system and preferred character set. To provide examples that will function correctly for all readers, we will create a simple Windows Forms application that displays characters in a list box. To begin, create a new Windows Forms project. Add a list box named "CharacterList" to the form. The sample code below can be used within the form's Load event.

IsDigit

A very common requirement when dealing with user input is to check if a character is a numeric digit. To perform this check, you can use the IsDigit method. The method has two overloaded versions. The first accepts the character to be tested as a parameter and returns a Boolean value.

To demonstrate, add the following code to the form's Load event handler. The code loops through several thousand possible characters and adds those that are categorised as numeric digits to the list box.

for (char c = (char)0; c < 65535; c++)
{
    if (char.IsDigit(c)) CharacterList.Items.Add(c);
}

The outputted characters on a standard UK English configuration of Windows are shown below. Note that in addition to the digits 0 to 9, the method identifies non-English digit characters.

Char.IsDigit Characters

The second version of the IsDigit method can be used to test a single character within a string. In this case, the first argument is the string and the second is the index of the character to be checked. For example:

Console.WriteLine(char.IsDigit("ABC1",3)); // Outputs "True"

NB: All of the methods described in this article include both overloaded versions.

IsLetter

The IsLetter method allows you to test whether a character is a letter. Digits, punctuation and white space are not classified as letters but letters from non-English alphabets are. Modify the line within the Load event handler's loop as follows to see the set of letter characters. On executing the program you should see several tens of thousands of results.

if (char.IsLetter(c)) CharacterList.Items.Add(c);

IsLetterOrDigit

The IsLetterOrDigit method provides a combination of the above two methods. This member returns true for any letter or digit character.

if (char.IsLetterOrDigit(c)) CharacterList.Items.Add(c);

IsNumber

The IsNumber method differs from IsDigit because it includes characters that represent numbers that are not single digits. Number characters include fraction symbols, subscript and superscript numbers, digits embedded within symbols and roman numerals.

if (char.IsNumber(c)) CharacterList.Items.Add(c);

IsLower

The IsLower method returns true for characters that are designated as lower case letters. This method returns false for upper case letters and those that are considered as neither upper case nor lower case.

if (char.IsLower(c)) CharacterList.Items.Add(c);

IsUpper

The IsUpper method is used to detect upper case letters. As with IsLower, letters that do not have a designated case generate a false result.

if (char.IsUpper(c)) CharacterList.Items.Add(c);

Checking for Symbols and White Space

IsPunctuation

The IsPunctuation method allows you to check if a character is a valid punctuation mark. All other characters cause a false result.

if (char.IsPunctuation(c)) CharacterList.Items.Add(c);

IsSymbol

The IsSymbol method returns true for a much wider set of characters than IsPunctuation. The matching characters include English symbols, technical, mathematical and geometric symbols, arrows, Braille characters and dingbats.

if (char.IsSymbol(c)) CharacterList.Items.Add(c);

IsSeparator

The IsSeparator method is used to detect separators such as spaces, line breaks and paragraph marks. The sample code below will execute this method. However, as most separators are non-printing characters, the output will be minimal.

if (char.IsSeparator(c)) CharacterList.Items.Add(c);

IsWhiteSpace

The final method that we will examine in this article is named "IsWhiteSpace". As the name suggests, the method is used to test for white space characters. These include spaces of varying sizes, character separators, tab separators, line breaks and paragraph marks.

if (char.IsWhiteSpace(c)) CharacterList.Items.Add(c);
Link to this Page31 October 2009
TwitterTwitter RSS Feed RSS