.NET 3.5+

Language-Integrated Query

by Richard Carr, published at http://www.blackwasp.co.uk/Linq.aspx

The first part of the LINQ to Objects tutorial describes the language-integrated query (LINQ) features that were introduced in version 3.5 of the .NET framework. LINQ provides a standardised means to query information from many different data sources.

What is LINQ?

A common development task is to query information, extracting a filtered list of items that may be ordered, grouped or aggregated. The data may be retrieved from an existing in-memory collection, a database, an XML file or many other sources. Usually, the type of the data source being examined dictates the syntax of the query, which may vary greatly, reducing the code's portability. Often the query will be generated within a string. This increases the risk of invalid or inaccurate queries as the strings will include no syntax checking and little or no support from the integrated development environment (IDE).

To alleviate these problems, Microsoft introduced language-integrated query (LINQ) into the .NET framework version 3.5. LINQ provides a set of standard query operators that can be used to perform simple or complex queries against a number of different data sources. The queries are integrated with other source code written in .NET languages such as C# or Visual Basic. This allows Visual Studio to provide syntax checking and Intellisense support.

In this tutorial we will be examining LINQ to Objects, which allows you to execute queries against in-memory data structures. Further LINQ providers are available to allow querying against SQL Server databases, XML, DataSets and many other data sources. You can also create your own providers to attach to domain-specific information.

A number of new features were added to the .NET framework and the C# programming language that provide support for LINQ. Extension methods and lambda expressions are used extensively to build queries. To allow data to be returned from a query without the need to first define a class or structure, LINQ often generates results using anonymous types. In addition, the previously available generics features are important. If you are unsure of any of these topics, follow the links to find articles describing them.

Standard Query Operators

The System.Linq namespace contains a number of standard query operators. These are extension methods that are available for all classes that implement the IEnumerable or generic IEnumerable<T> interfaces. The methods allow collections to be queried, aggregated and sorted. The methods can be chained together to generate more complex queries. The easiest way to understand the query operators is to see them in action. Try executing the following code in a console application. Ensure you include a using directive for the System.IO namespace.

var folders = Directory.GetDirectories(
    @"C:\").Where(d => d.Length > 10).OrderBy(d => d.Length);

foreach (string folder in folders)
{
    Console.WriteLine(folder);
}

The above sample is quite simple but performs a task that would otherwise require several lines of code instead of just one. The first line is the important one. It retrieves a filtered list of folders from the path "C:\". The Where operator and its lambda expression parameter specifies that only paths that are greater than ten characters in length are retrieved. The OrderBy method sorts the results according to their length. The remainder of the code simply outputs the results of the operation.

One of the advantages of LINQ is that it employs a declarative style of programming, whereas standard C# provides imperative code. Imperative programming requires that you specify exactly how an algorithm operates. An imperative version of the previous sample code would require that you used a loop and an if statement to determine which folder names should be added to a collection. You would then implement a sorting algorithm to order the results. The declarative approach adds a layer of abstraction, allowing you to specify what you wish to achieve without knowing the underlying algorithms that will be applied.

Query Expression Syntax

In addition to the standard query operators, LINQ provides a new query syntax. This allows queries to be created in a format that some developers find more natural. The queries are similar to those used with structured query language (SQL) for querying databases. We can recreate the previous example using the query syntax as follows:

var folders =
    from d in Directory.GetDirectories(@"C:\")
    where d.Length > 10
    orderby d.Length
    select d;

foreach (string folder in folders)
{
    Console.WriteLine(folder);
}

The above query is quite easy to read. Although it is actually a single statement I have separated it into several lines to highlight the key operations. Firstly, the data source is specified with the from keyword. Secondly, the filter is applied by the where clause. The sort order is determined by the orderby element and the items to include in the results are specified in the select element.

Deferred Execution

At first glance, you might imagine that the LINQ queries described above return a collection of strings as soon as the line of code is processed. In reality, LINQ employs deferred execution, or lazy loading. Queries that return a single value or object do execute immediately. However, those that return a list of items are usually not executed until the first time that the results are used.

Deferred execution provides several benefits. If you have a number of queries, with each retrieving details from the results of another, the queries can be combined into a single operation. With LINQ to Objects this reduces the memory overhead that would be required for the interim lists and potentially improves the query's performance. For other LINQ providers it may minimise network traffic or database activity. The main disadvantage to this approach is that you can be surprised by the results of a query if the data changes after the query is defined but before it is executed.

We can demonstrate deferred execution by running the following sample code. This creates a list of strings containing three items and then defines a query that returns all values from the list. Following the query line, another item is added to the original source list and the results of the query are outputted to the console. If the query had been executed when it was encountered, the "values" variable would contain three results. However, because the query is executed when the results are first read, the additional value is included and four strings are outputted to the console.

var source = new List<string> { "A", "B", "C" };

var values =
    from s in source
    select s;

source.Add("D");

foreach (string value in values)
{
    Console.WriteLine(value);
}

/* OUTPUT

A
B
C
D

*/

LINQ to Objects Tutorial

The articles in this tutorial will describe the use of LINQ to Objects for querying in-memory data structures. The first group of articles will describe how to construct queries using basic operators and query expression syntax. This will include how information from multiple sources can be joined and how to aggregate numeric data. The later articles will describe groups of related standard query operators.

Next: Simple LINQ Queries

12 June 2010