.NET 1.1+

The System.Uri Class

by Richard Carr, published at http://www.blackwasp.co.uk/Uri.aspx

Software that uses networked resources, web sites or remote files often makes use of Uniform Resource Identifiers to locate those resources. Although these can be held in simple strings, using the System.Uri class provides many additional benefits.

Uniform Resource Identifiers

A uniform resource identifier (URI) is a string that describes the location of a resource, usually one accessed over a network. They are most commonly used by web sites to identify individual pages and linked images. For example, the URI of this page is "http://www.blackwasp.co.uk/Uri.aspx". You may also encounter them when locating a file or folder. For example, you might see the URI, "file://c:/windows/system/vga.drv", which points to a local file.

URIs have a specific structure, although the parts of the URI are optional. They can include the following sections:

URI Scheme. The scheme appears at the start of the URI. It describes the type of resource or the method by which it is retrieved. Common examples are "http", "ftp" and "file". The URI scheme is followed by a colon character (:).
Authority. Directly after the URI scheme, if present, is the authority. This is generally used to provide the name or address of a server, or a domain name in the case of web site addresses. The authority for this web page is "www.blackwasp.co.uk". When a port number is required to access a resource, it appears at the end of the authority, separated from the name or address by a colon. For example, "www.blackwasp.co.uk:80".
User Information. When you need to provide a user name and password to a URI, this can be included in the URI user information section. This appears before the authority with a colon between the user name and password and an at sign (@) following the user information. For example, "http://bob:password@localhost".
Path. Conceptually, the path section of the URI provides a path to the resource. This can be literally true but often the path does not relate to any underlying structure. For example, this web page's path is "/Uri.aspx". However, due to URL rewriting, this does not mean that the Uri.aspx file is in the root of the web site, or even that Uri.aspx exists as a file at all. Paths are made up of segments, each separated from the previous by a forward slash character. For example, in the file URI mentioned earlier the path is "c:/windows/system/vga.drv".
Query. Some resources are fixed, requiring no more information than already described above. Others are more complex and produce different results according to a query. A query is simply an extension to the URI, appended after the path and a question mark (?). The structure of a query is not fixed but you will often see a query containing a set of key / value pairs, each separated by an ampersand (&). In the URI, "http://www.blackwasp.co.uk/CSharpArticles.aspx?page=2" the query is "page=2".
Fragment. The final part that can be included in a URI is the fragment. This is an additional string section of the URI that describes a subsection of the overall URI. It appears at the end of the URI after a hash symbol (#). When used in web sites, the fragment is usually used to automatically scroll the user's browser to a specific part of a web page.

Sometimes you will find that a URI is described as containing four parts. These are the scheme, hierarchical part, query and fragment. The hierarchical part in this description is the combination of authority, user information and path.

Uri Class

URIs are designed to be able to be held in simple strings. However, processing strings to extract parts of a URI or to modify or compare URIs is tedious. To remove this problem, the .NET framework includes the Uri class as part of the System namespace. This class can be used to hold a URI and includes many properties and methods that allow it to be queried and manipulated.

We can create a Uri object that holds a URI using the code below. Note that the URI we wish to store is passed to the constructor using a string parameter.

Uri uri = new Uri("http://www.blackwasp.co.uk");
Console.WriteLine(uri);

/* OUTPUT

http://www.blackwasp.co.uk

*/

When you create a Uri object in this manner you obtain an immediate benefit over holding the URI in a string. During construction, the URI provided is validated. If it is invalid, a UriFormatException is thrown. You can see this in the sample code below. Here the URI includes a common mistake of using backslashes instead of forward slashes

Uri uri = new Uri("http:\\www.blackwasp.co.uk");
Console.WriteLine(uri);

/* EXCEPTION

Unhandled Exception: System.UriFormatException:
Invalid URI: The Authority/Host could not be parsed.

*/

Absolute and Relative URIs

URIs can broadly be categorised as absolute or relative. An absolute URI includes a scheme and authority, whereas a relative URI omits these details. Relative URIs assume that these parts are already known. For example, in a web site a relative URI can be used for a link to another page. When the link is clicked, the relative URI will be combined with the authority and scheme of the page that the link appears within in order to find the new resource.

If you attempt to create a Uri object for a relative URI using the simple constructor used previously, you will find that an exception is thrown.

Uri uri = new Uri("/Uri.aspx");
Console.WriteLine(uri);

/* EXCEPTION

Unhandled Exception: System.UriFormatException:
Invalid URI: The format of the URI could not be determined.

*/

The problem here is that the single-parameter constructor assumes that the string provided will hold an absolute URI. To create a relative Uri you must add a second parameter, which is of the UriKind enumerated type. This specifies the type of URI being used. You can specify Absolute, Relative or AbsoluteOrRelative. The third of these options tells the constructor that the type is unknown.

Uri uri = new Uri("/Uri.aspx", UriKind.Relative);
Console.WriteLine(uri);

/* OUTPUT

/Uri.aspx

*/

Next: Page 2

20 June 2012