 .NET 1.1C# String Data Type
The sixteenth part of the C# Fundamentals tutorial begins an examination of the string data type. This is possibly the most important native data type in the C# programming language. Using strings, we can perform processing of textual information.
The String Data Type
The string data type defined within C# is used to define a series of Unicode characters. The series can be as short as zero characters in length, known as an empty string or can be much longer. The theoretical limit to string size is several billion characters. However, this is generally not achievable, as the operating system will not necessarily provide enough memory to store and process such large strings.
Strings Are Objects
Variable types in C# can be broadly categorised into three groups. These are value types, structures and classes. All of the data types that have been examined so far in this tutorial have been value types or structures. These create variables that contain one or more values and that can contain methods and properties as seen with the nullable types, which provide properties such as HasValue and methods such as GetValueOrDefault.
The string data type is not a value type or structure; it is a class. Classes differ from structures in many ways that are beyond the scope of this tutorial. They are designed for object-oriented programming which will be described in a future tutorial. However, there are some key points about classes that will be described in this article and that will arise later in the tutorial.
When a variable of a class type is created, or instantiated, the variable that is created is known as an object. So all strings are actually objects of type string. All objects can be set to null to indicate that they are not yet defined. Therefore any object of the string data type is nullable. Unlike the nullable numeric and character data types described earlier in this tutorial, this applies in all versions of the .NET framework.
A second important fact about objects is that rather than being value types, they are actually reference types. This means that the data in the object is stored somewhere in the computer's memory and the variable holds a pointer to that area of memory. This means that two objects of the same type can actually point to the same area of memory, in which case the two objects are actually the same thing; a change to one is mirrored in the other. This will become more important when we consider the creation of methods later in the tutorial.
Immutability
An important aspect of the use of strings is that they are read-only or immutable. This means that once a string has been created, its contents cannot be modified. This can be a surprise, even to experienced C# developers as strings can apparently be created and modified easily using the language. However, when a string is modified the .NET framework actually discards the original string and creates a new one. For strings that are modified infrequently this provides a much more efficient method of operation. Frequently modified strings can quickly become inefficient; another class called the StringBuilder should be used for this type of variable. The StringBuilder class is a .NET framework class that is not specific to C# and will be described in a future article.
String Assignment Literals
The value of a string is assigned using the standard assignment operator (=). As with the character data type, when assigning a literal value delimiting symbols must be used to identify the start and the end of the string information. Unlike the character data type, quotation marks (") are used for this purpose.
string helloString = "Hello world";
As indicated earlier strings are objects that are capable of holding null values. These are assigned in a similar manner as when using nullable numeric data types.
string nullString = null;
A null string indicates that the variable has an undefined value. It is often that case that you wish to store an empty string or zero-length string in a string object. This differs from null as the value is defined, even though it contains no characters. There are two methods for assigning a variable an empty string value. You can simply use two quotation marks with nothing in between or you can use the String.Empty value. The second method is sometimes preferred because it can be considered more readable.
string emptyString;
emptyString = ""; // emptyString is empty
emptyString = String.Empty; // emptyString is still empty
String Constructors
All classes contain one or more constructors. A constructor is a special method in a class that is executed when a new object of that class is instantiated. Usually a constructor prepares the object's values and obtains any other resources that it may require, sometimes using parameters provided by the programmer.
The string class defines several constructors, each identified by the parameters used. Most of these are beyond the scope of this tutorial. However, one interesting constructor allows the programmer to specify a character and an integer to create a string of a single character repeated many times. This constructor is used in the following example. Note the use of the new keyword that is used whenever objects are instantiated in this way, and the constructor parameters within the parentheses.
string repeating = new string('.', 15); // repeating = "..............."
Escape Characters
The string literals described so far are useful in many cases. However, there are some characters that cannot be easily included. As an example, a quotation mark cannot be included in a string literal as described above as it would signify the end of the literal. The following code would fail to compile for this reason.
string invalid = "This is a "string".";
Console.WriteLine(invalid);
To allow a quotation mark to be included in a string literal, a special code or escape character sequence is inserted. All escape character sequences are represented as a backward slash (\) followed by one or more further characters that determine the information that will be inserted in their place. For a quotation mark, the escape code is a backward slash and a quotation mark (\"). To make the previous example valid, the following code is used instead.
string valid = "This is a \"string\".";
Console.WriteLine(valid); // Outputs: This is a "string"
A similar problem exists for character literals where the apostrophe character is required. This is solved with a similar escape character sequence (\').
char apostrophe = '\'';
There are several other escape characters available to the C# programmer. They include the following:
| Escape Character | Purpose |
|---|
| \\ | Backslash (\). As the backward slash character defines as escape character sequence, the double-backslash is required to indicate the insertion of a single backslash. | | \n | New Line. Inserts a new line into the string literal. When outputted, the output starts a new line of text when the control character is reached. | | \t | Horizontal Tab. Inserts a horizontal tab into a text string, moving the current position to the next tab stop in a similar manner to pressing the tab key in a word processor. | | \0 | Unicode Character Zero / Null. This character is generally used as a marker at the end of a file or data stream. | | \a | Alert. In some scenarios this character sounds an alert through the computer's speaker. | | \b | Backspace. Emits a backspace character. | | \f | Form Feed. This character instructs a printer to execute a form feed, ejecting the current sheet of paper and readying the next sheet. | | \r | Carriage Return. This is similar to the new line character when used for screen output. Some printers use this as an indicator to return to the start of the current line. In this case, to begin a new line both a carriage return and new line character (\r\n) are required. | | \v | Vertical Tab. Inserts a vertical tab. | | \uxxxx | Inserts the character with the Unicode character number xxxx. The xxxx portion is a four digit hexadecimal number. |
Verbatim String Literals
C# supports a second type of string literal known as a verbatim string literal. This literal does not require the use of escape characters to define special characters. Instead, any information in the source code, including new lines, is included in the string. To define a string literal an @ symbol is placed before the opening quotation mark. The only character that requires a different action is the quotation mark itself, which must be entered twice to indicate a single character.
string literal = @"Literal strings can include backslash (\) characters.";
string quoted = @"This is a ""string"".";
string multiline = @"This is line one.
This is line two.";
|