BlackWaspTM

This web site uses cookies. By using the site you accept the cookie policy.This message is for compliance with the UK ICO law.

Regular Expressions
.NET 1.1+

Regular Expression Substitutions

The twelfth part of the Regular Expressions in .NET tutorial looks at substitutions. These allow elements of an input string to be matched and replaced with alternative text. The replacement pattern can include literal characters and elements from the original match.

Substitutions

A very powerful feature of the regular expressions engine is the search and replace functionality provided by substitutions. They allow you to perform a match using all of the pattern characters that we've already seen in the tutorial. Rather than simply returning the matches, they are replaced with other text and the updated string is returned.

The replacement string can be literal text in simple cases. For more complex operations, you can include information from the input string in the substitution, including the whole match or the contents of a captured group.

Simple Substitution

To perform a substitution, you use the Replace method of the Regex class, instead of the Match method that we've seen in earlier articles. This method is similar to Match, except that it includes an extra string parameter to receive the replacement value.

To demonstrate, try running the following code. The call to Replace includes three arguments. The first is the input string, which includes two numbers that could be credit card details. The second is the pattern to match, which looks for four groups of four numeric digits. The third provides the replacement string. In this case, the replacement text contains literal characters only; they replace the credit card numbers with asterisks.

string input = "Don't give away your credit card number, "
             + "be it 1111 1111 1111 1111 or 8493 2349 5173 8495.";

string find = @"\d\d\d\d \d\d\d\d \d\d\d\d \d\d\d\d";
string replace = "**** **** **** ****";

string result = Regex.Replace(input, find, replace);

Console.WriteLine(result);

/* OUTPUT
 
Don't give away your credit card number, be it **** **** **** **** or **** **** **** ****.
 
*/

Including Captured Groups in a Replacement String

Substitutions become much more powerful when you include elements of the input string in the replacement. A common approach is to copy information from a captured group into the replacement text. You can do so using numbered or named captured groups. For numbered groups, include a dollar sign ($) and the group number in the substitution string. This placeholder will be replaced with the information from the match.

To demonstrate, run the following program. In this case, the input string contains a list of names, each on a new line. Some of the names are provided with the forename before the surname, whilst others have the surname first, separated from the forename with a comma and a space.

The code matches names where the surname appears before the forename. The surname is included in the first captured group and the forename in the second. All of the matches are replaced using the pattern, "$2 $1". This swaps the order of the names and removes the comma.

string input = "Bob Smith\n"
             + "Green, Mel\n"
             + "Sam Jones\n"
             + "Black, Liz\n"
             + "White, Tim\n";

string find = @"^(\w+), (\w+)$";
string replace = "$2 $1";

string result = Regex.Replace(input, find, replace, RegexOptions.Multiline);

Console.WriteLine(result);

/* OUTPUT

Bob Smith
Mel Green
Sam Jones
Liz Black
Tim White

*/

If you are using named groups, you can use the name instead of the number after the dollar sign if you surround it with braces. The example below is the functionally equivalent to the previous one, but uses named groups.

string input = "Bob Smith\n"
             + "Green, Mel\n"
             + "Sam Jones\n"
             + "Black, Liz\n"
             + "White, Tim";

string find = @"^(?<Surname>\w+), (?<Forename>\w+)$";
string replace = "${Forename} ${Surname}";

string result = Regex.Replace(input, find, replace, RegexOptions.Multiline);

Console.WriteLine(result);

/* OUTPUT
 
Bob Smith
Mel Green
Sam Jones
Liz Black
Tim White
            
*/

NB: If your replacement string includes numeric digits, they can become ambiguous if they appear next to a numbered group. You can surround the group number with braces to avoid confusion. For example, "$11" would match group 11 if present but "${1}1" will match group one and follow it with the character, '1' in the substitution.

Another option is to include the details from the last captured group in the replacement string, regardless of its name or number. To do so, include the placeholder, "$+", as in the following example:

string input = "Bob Smith\n"
             + "Green, Mel\n"
             + "Sam Jones\n"
             + "Black, Liz\n"
             + "White, Tim";

string find = @"^(?<Surname>\w+), (?<Forename>\w+)$";
string replace = "$+ ${Surname}";

string result = Regex.Replace(input, find, replace, RegexOptions.Multiline);

Console.WriteLine(result);

/* OUTPUT
 
Bob Smith
Mel Green
Sam Jones
Liz Black
Tim White
            
*/
23 November 2015