BlackWaspTM
Regular Expressions
.NET 1.1+

Regular Expression Comments

The sixteenth part of the Regular Expressions in .NET tutorial describes how comments can be added to a regular expression pattern. This allows complex patterns to include explanatory text.

Comments

When writing code, it is usually advisable to minimise the use of comments that describe the functionality. When code is difficult to understand, it is better to refactor it to make it more maintainable, rather than add a comment. If you do not, it is possible that the code may be updated in the future without changing the comment. This leads to out of date comments that can cause confusion.

When working with regular expressions, the language is so terse that creating readable patterns quickly becomes incredibly difficult or even impossible. In such situations, a well-worded comment may be essential. You could add comments near the regular expression using the syntax of the language you are using to call the regular expressions engine. You can also include them within the pattern itself.

There are two ways to add a comment to a regular expression. The first is to create a group specifically for the comment. As with other groups, this is defined with a pair of parentheses. The comment is placed within the parentheses and prefixed with a question mark and a hash symbol. The syntax for a comment is therefore:

(?# Comment)

The following code demonstrates this syntax. The example looks for hyperlinks in an HTML document. The three comments show the parts of the regular expression that locate the opening and closing tags of the hyperlink and the information that would be displayed in a web browser.

string input = "For more information use the "
             + "<a href='http://www.blackwasp.co.uk/Contact.aspx'>contact form</a> "
             + "or check the list of "
             + "<a href='http://www.blackwasp.co.uk/FAQ.aspx'>frequently "
             + "asked questions</a>.";

string pattern = "(<a href=')(.*?)('>)(?#Opening tag)(.*?)(?#Display)(</a>)(?#Closing tag)";

foreach (Match match in Regex.Matches(input, pattern))
{
    Console.WriteLine("Matched '{0}'", match.Value);
}

/* OUTPUT
 
Matched '<a href='http://www.blackwasp.co.uk/Contact.aspx'>contact form</a>'
Matched '<a href='http://www.blackwasp.co.uk/FAQ.aspx'>frequently asked questions</a>'
 
*/

If you are using the option to ignore unescaped white space within a regular expression, using either the 'x' code or the IgnorePatternWhitespace option, you can use a second comment syntax. In this mode comments are prefixed with a hash symbol (#). The comment extends to the end of the line.

# Comment

The following example includes a regular expression that extracts IP addresses. It includes a comment defined only by the hash symbol. If you were to remove the RegexOptions.IgnorePatternWhitespace argument from the call to Match, no matches would be found because the comment would be seen as part of the pattern.

string input = @"Some of these are valid IP addresses:
 
0.1.2.3
10.20.30.40
100.200.300.400
192.0.0.256
192.0.0.255
192.0.0.249
10.1.30.150
1.01.1.1
99.00.99.00";

string pattern = @"((2[0-4]\d|25[0-5]|1\d{2}|[1-9]\d|\d)\.){3}"
               + @"(2[0-4]\d|25[0-5]|1\d{2}|[1-9]\d|\d) #Extracts IP addresses";

foreach (Match match in Regex.Matches(input, pattern, RegexOptions.IgnorePatternWhitespace))
{
    Console.WriteLine("Matched '{0}' at index {1}", match.Value, match.Index);
}

/* OUTPUT
     
Matched '0.1.2.3' at index 41
Matched '10.20.30.40' at index 50
Matched '192.0.0.25' at index 80
Matched '192.0.0.255' at index 93
Matched '192.0.0.249' at index 106
Matched '10.1.30.150' at index 119
              
*/
21 December 2015