BlackWaspTM

This web site uses cookies. By using the site you accept the cookie policy.This message is for compliance with the UK ICO law.

Regular Expressions
.NET 1.1+

Regular Expression Quantifiers

The seventh part of the Regular Expressions in .NET tutorial continues to look at the pattern-matching characters that can be used in regular expressions. This article describes quantifiers, which allow matching repeating items in the source text.

Quantifiers

So far in the regular expressions tutorial we've seen how literals and special characters can be combined to create a pattern, which can be matched within a string. In almost all cases, the pattern has only been able to generate matches of a fixed length.

Quantifiers allow you define a pattern that contains characters, or groups of characters, that repeat. They allow much more powerful regular expressions that include optional characters, text that must appear but can repeat, and patterns that match both repeating text and missing items. You can even specify the number of times that an item must repeat in order to be accepted as a match.

In this article we'll see examples of all of the quantifiers, including a regular expression that accurately matches IP addresses.

Optional Matching

In the article describing anchors in regular expressions, we saw the question mark character (?) used to optionally match a carriage return. This is the first quantifier. It specifies that the character immediately preceding the question mark must be matched either zero or one times.

The example below shows the use of the quantifier. Here the word, "colour" is found, with or without the letter, 'u'.

string input = "The USA spelling of colour is color.";

foreach (Match match in Regex.Matches(input, "colou?r"))
{
    Console.WriteLine("Matched '{0}' at index {1}", match.Value, match.Index);
}

/* OUTPUT
    
Matched 'colour' at index 20
Matched 'color' at index 30
             
*/

All of the quantifiers can be applied to either a single character or a pattern contained within parentheses. For example, the code below finds either "fox" or "lazy fox".

string input = "The quick brown fox jumps over the lazy fox.";

foreach (Match match in Regex.Matches(input, @"(lazy )?fox"))
{
    Console.WriteLine("Matched '{0}' at index {1}", match.Value, match.Index);
}

/* OUTPUT
    
Matched 'fox' at index 16
Matched 'lazy fox' at index 35
             
*/

Matching Repeating Characters

If you wish to find a repeating character or pattern, you can use the plus sign (+). This specifies that the item immediately preceding it must appear at least once but can appear many times.

The code below looks for the pattern, "\d+". This finds all groups of adjacent numeric digits.

string input = "1, 1, 2, 3, 5, 8, 13, 21, 34";

foreach (Match match in Regex.Matches(input, @"\d+"))
{
    Console.WriteLine("Matched '{0}' at index {1}", match.Value, match.Index);
}

/* OUTPUT
    
Matched '1' at index 0
Matched '1' at index 3
Matched '2' at index 6
Matched '3' at index 9
Matched '5' at index 12
Matched '8' at index 15
Matched '13' at index 18
Matched '21' at index 22
Matched '34' at index 26
             
*/

Optionally Matching Repeating Characters

The asterisk quantifier (*) also allows you to find a series of repeating elements. However, if the item preceding the asterisk does not appear, a match is still possible. This is because the quantifier matches the item zero or more times.

Consider the following code. The regular expression looks for text containing zero or more capital letters followed by zero or more numeric digits. The source text contains five alphanumeric strings that are all matched. In addition, a zero-length string is found immediately after each of those matching strings. These are matches because they include zero letters and zero numbers.

string input = "1234 A123 AB12 ABC1 ABCD";

foreach (Match match in Regex.Matches(input, @"[A-Z]*[0-9]*"))
{
    Console.WriteLine("Matched '{0}' at index {1}", match.Value, match.Index);
}

/* OUTPUT
    
Matched '1234' at index 0
Matched '' at index 4
Matched 'A123' at index 5
Matched '' at index 9
Matched 'AB12' at index 10
Matched '' at index 14
Matched 'ABC1' at index 15
Matched '' at index 19
Matched 'ABCD' at index 20
Matched '' at index 24
             
*/
30 September 2015