BlackWaspTM

This web site uses cookies. By using the site you accept the cookie policy.This message is for compliance with the UK ICO law.

Regular Expressions
.NET 1.1+

Regular Expression Quantifiers

The seventh part of the Regular Expressions in .NET tutorial continues to look at the pattern-matching characters that can be used in regular expressions. This article describes quantifiers, which allow matching repeating items in the source text.

Matching an IP Address

For a final exercise, let's create a pattern that accurately matches IP addresses. Unlike the pattern that was mentioned at the beginning of the regular expressions tutorial, we won't simply look for three numbers separated by full stops, or periods. We'll create a pattern that ensures that each of the four numbers is in the range 0-255.

The pattern we need has to match a valid number four times. Let's first consider how we can match a single number that falls within the correct range. We know that each number will have either one, two or three digits, so a naive pattern would look for this many numbers. The regular expression below achieves this with the use of the numeric digit character class and a quantifier.

\d{1,3}

The above pattern does not work correctly because the range of values found is zero to 999. Let's improve it slightly. If the number is either one or two digits in length, we'll accept it. For three digit values, let's ensure that it starts with either '1' or '2'. We can use an alternation with the three acceptable patterns, as follows:

(2\d{2}|1\d{2}|\d{1,2})

This limits the potential range of values to between zero and 299, which is still incorrect. It also means that we will accept two-digit numbers with leading zeroes. Let's deal with the leading zeroes first by using an alternation that permits either a single digit or two digits where the first is in the range 1-9:

(2\d{2}|1\d{2}|[1-9]\d|\d)

To obtain the correct range of 0-255, we'll remove the first option from the alternation and replace it with two new possible patterns. The first will cater for the values 200-249 by matching a '2', followed by a character range of "0-4" and any final digit. The second will match the values from 250 to 255. We'll use a pattern that begins with "25" and ends with a digit between zero and four. The updated pattern is shown below:

(2[0-4]\d|25[0-5]|1\d{2}|[1-9]\d|\d)

We need to match the above pattern four times with a full stop between each item. One way to achieve this is by grouping the above pattern and a full stop, with a quantifier that requires exactly three copies of the pattern, followed by another copy of the number pattern. The full regular expression is as follows:

((2[0-4]\d|25[0-5]|1\d{2}|[1-9]\d|\d)\.){3}(2[0-4]\d|25[0-5]|1\d{2}|[1-9]\d|\d)

You can see from this example that regular expressions can be very powerful but can also become complex very quickly. To demonstrate the above pattern, try running the following code. This extracts the valid IP addresses from the input string and ignores the invalid ones.

string input = @"Some of these are valid IP addresses:

0.1.2.3
10.20.30.40
100.200.300.400
192.0.0.256
192.0.0.255
192.0.0.249
10.1.30.150
1.01.1.1
99.00.99.00";

string pattern =
    @"((2[0-4]\d|25[0-5]|1\d{2}|[1-9]\d|\d)\.){3}(2[0-4]\d|25[0-5]|1\d{2}|[1-9]\d|\d)";

foreach (Match match in Regex.Matches(input, pattern))
{
    Console.WriteLine("Matched '{0}' at index {1}", match.Value, match.Index);
}

/* OUTPUT
    
Matched '0.1.2.3' at index 41
Matched '10.20.30.40' at index 50
Matched '192.0.0.25' at index 80
Matched '192.0.0.255' at index 93
Matched '192.0.0.249' at index 106
Matched '10.1.30.150' at index 119
             
*/

NB: The pattern could be modified further to include word boundary anchors to avoid matching items where two matches are immediately adjacent.

30 September 2015