BlackWaspTM

This web site uses cookies. By using the site you accept the cookie policy.This message is for compliance with the UK ICO law.

Regular Expressions
.NET 1.1+

Regular Expression Character Classes

The fourth part of the Regular Expressions in .NET tutorial continues to look at the characters used within a regular expression. This article describes characters classes, which allow the creation of patterns that are not restricted to matching only literal characters.

Character Classes

So far in the regular expressions tutorial we have matched patterns based only on literal characters and escape sequences. This has allowed a demonstration of some of the key methods provided by the regular expressions engine, and the Regex class, but offers little benefit over searching using the string class. The real power of regular expressions is provided by the ability to find text that matches a more flexible pattern.

In this article we'll start to look at the ways in which you can create a pattern that matches more than literal strings. We'll begin by looking at character classes, which are sometimes known as character sets. They allow you to specify that, when searching for a pattern, the character at the specified position in the regular expression may several possible characters at the corresponding position in the search string. A common example of the use of a character class is to match any numeric digit, or to match any character at all. They are also useful for matching misspelled words or those that have multiple, similar spellings, such as "serialise" and "serialize".

Wildcards

The first of the character class symbols is the full stop, or period (.). This is a wildcard that will match any single character from the source text. It is useful when you want to match a string containing literal characters in specific positions but where characters between those literals can be anything.

In the example code below, the input variable is matched against a regular expression containing three wildcards. Wherever an 'r' is followed by an 's' with three characters in-between, a match is found. Note that the two literal letters are case-sensitive so the match operation finds "rings" and "ruins" but not "Rains".

string input = "Rains ran rings around the Roman ruins";

foreach (Match match in Regex.Matches(input, "r...s"))
{
    Console.WriteLine("Matched '{0}' at index {1}", match.Value, match.Index);
}

/* OUTPUT
  
Matched 'rings' at index 10
Matched 'ruins' at index 33
  
*/

Character Groups

Character groups act in a similar manner to wildcards, allowing you to match a single character in the source string to one of several possible values. Instead of matching any character, you specify exactly which characters are considered as valid matches.

To create a character group, you enclose all acceptable matching characters in brackets. For example, if you wanted to match only vowels, you could use the character group, "[aeiou]".

Let's modify the previous example so that it finds all sequences of five characters that start with 'R' or 'r', and end with 's', removing the case-sensitivity of the first letter:

string input = "Rains ran rings around the Roman ruins";

foreach (Match match in Regex.Matches(input, "[Rr]...s"))
{
    Console.WriteLine("Matched '{0}' at index {1}", match.Value, match.Index);
}

/* OUTPUT
  
Matched 'Rains' at index 0
Matched 'rings' at index 10
Matched 'ruins' at index 33
            
*/

Negation

You can invert the operation of a character group by including a caret symbol (^) immediately after the opening bracket. A negated character group matches any single symbol that does not appear within the brackets. For example, to find any character that is not a vowel, you could use, "[^aeiou]".

The following example finds five-character matches that start with 'R' or 'r' and end with 's' where the third character is not a vowel. The result is a single match.

string input = "Rains ran rings around the Roman ruins";

foreach (Match match in Regex.Matches(input, "[Rr].[^aeiou].s"))
{
    Console.WriteLine("Matched '{0}' at index {1}", match.Value, match.Index);
}

/* OUTPUT
  
Matched 'rings' at index 10
            
*/
13 September 2015