BlackWaspTM

This web site uses cookies. By using the site you accept the cookie policy.This message is for compliance with the UK ICO law.

Regular Expressions
.NET 1.1+

Regular Expression Lookahead and Lookbehind

The tenth part of the Regular Expressions in .NET tutorial looks at zero-length lookahead and lookbehind assertions. These non-capturing group constructs allow patterns that find matches based upon text that precedes or follows another part of the pattern, or based upon text not being present.

Lookahead and Lookbehind

Zero-length lookahead and lookbehind assertions, sometimes known as lookaround assertions, are special types of non-capturing group. They allow you to perform complex matches based on information that follows or precedes a pattern, without the information within the lookahead assertion forming part of the returned text.

Positive Lookahead

The first type of lookaround assertion is positive lookahead. This construct appears after an initial pattern to be matched. It asserts that the first part of the pattern must be followed directly by the lookahead element. However, the returned match only contains the text that matches the first part.

To define a positive lookahead assertion, create a group, surrounded by parentheses, that starts with a question mark and an equals sign (?=). The text within the parentheses, after the initial equals sign, is the pattern of the lookahead.

To demonstrate, consider the following simple code:

string input = "Andy Smith\n"
             + "Jim Brown\n"
             + "Lisa Smith\n"
             + "Sue Brown";

string pattern = @"^\w+(?=\sBrown)";

foreach (Match match in Regex.Matches(input, pattern, RegexOptions.Multiline))
{
    Console.WriteLine("Matched '{0}' at index {1}.", match, match.Index);
}

/* OUTPUT
  
Matched 'Jim' at index 11.
Matched 'Sue' at index 32.
            
*/

The pattern above matches forenames for people with the surname, "Brown". The first part of the regular expression is "^\w+". This captures one or more word characters at the start of a line, extracting the forenames. Without the lookahead, the pattern would match all four first names from the list in the input string.

The second part, "(?=\sBrown)" is the zero-length lookahead assertion. The pattern to match is a whitespace character, followed by the surname, "Brown". This means that the whole regular expression returns the names, "Jim" and "Sue".

Negative Lookahead

Negative lookahead assertions are arguably more powerful than their positive counterparts. They assert that the pattern in the lookahead must not follow the text matched by the initial part of the pattern. To create a negative lookahead, replace the equals sign of the positive variation with an exclamation mark (!).

The following sample matches the first word at the start of each line. The negative lookahead eliminates matches if they are followed by a space and the surname, "Brown".

string input = "Andy Smith\n"
             + "Jim Brown\n"
             + "Lisa Smith\n"
             + "Sue Brown";

string pattern = @"^\w+\b(?! Brown)";

foreach (Match match in Regex.Matches(input, pattern, RegexOptions.Multiline))
{
    Console.WriteLine("Matched '{0}' at index {1}.", match, match.Index);
}

/* OUTPUT
  
Matched 'Andy' at index 0.
Matched 'Lisa' at index 21.          

*/

Positive Lookbehind

Positive lookbehind reverses the order of positive lookahead. The lookbehind part of the pattern, which usually appears at the start of a regular expression, specifies the text that must appear before the text that will be returned. The lookbehind assertion is defined as a group within parentheses. After the opening parenthesis, the string, "?<=" prefixes the pattern to match.

In the following example, the lookbehind element finds the name, "Lisa", at the start of a line and followed by a space. This is followed by a pattern that captures one or more word characters. The result is that surnames for people named, "Lisa", are matched.

string input = "Andy Smith\n"
             + "Jim Brown\n"
             + "Lisa Smith\n"
             + "Sue Brown";

string pattern = @"(?<=^Lisa )\w+";

foreach (Match match in Regex.Matches(input, pattern, RegexOptions.Multiline))
{
    Console.WriteLine("Matched '{0}' at index {1}.", match, match.Index);
}

/* OUTPUT
  
Matched 'Smith' at index 26.
            
*/

Negative Lookbehind

As with lookahead, you can create negative lookbehind assertions. These specify that a match is only valid if it is preceded by the text in the lookbehind group. Again, the equals sign of the positive variant is replaced with an exclamation mark.

The following code finds all surnames that do not follow the name, "Lisa" at the start of a new line:

string input = "Andy Smith\n"
             + "Jim Brown\n"
             + "Lisa Smith\n"
             + "Sue Brown";

string pattern = @"(?<!^Lisa )\b\w+$";

foreach (Match match in Regex.Matches(input, pattern, RegexOptions.Multiline))
{
    Console.WriteLine("Matched '{0}' at index {1}.", match, match.Index);
}

/* OUTPUT
  
Matched 'Smith' at index 5.
Matched 'Brown' at index 15.
Matched 'Brown' at index 36.            

*/
25 October 2015