
.NET 4.0+Parallel LINQ
The fifteenth and final part of the Parallel Programming in .NET tutorial leaves the examination of imperative programming using loops and tasks. It starts the description of declarative programming using Parallel Language-Integrated Query.
Language-Integrated Query
Language-Integrated Query (LINQ) provides a declarative model that permits querying of sequences of data, such as in-memory collections, XML documents and database data. Unlike the imperative code we have seen so far in this tutorial, when you use LINQ you are concerned with what you are trying to achieve, rather than the mechanics of how you achieve it; LINQ hides the implementation details of looping and conditional statements, allowing querying with lambda expressions, standard query operators and a new query syntax.
The nature of many queries means that they can be easily parallelised. Most queries perform the same group of actions for each item in a collection. If all of those actions are independent, with no side effects caused due to the order in which they appear, you can often achieve a large performance increase by dividing the work between several processor cores. To support these scenarios, the .NET framework version 4.0 introduced Parallel LINQ (PLINQ).
PLINQ provides the same standard query operators and query expression syntax as LINQ. The key difference is that the source data can be broken into sections using data decomposition. These smaller data groups of data are then potentially processed by all of the available CPU cores. As we will see, converting a LINQ query to its parallel counterpart requires a trivial change.
PLINQ does have some limitations that mean it is not a direct replacement for LINQ and cannot be the default option for querying. Key amongst the limitations are that side-effects of processing individual items from source sequences, such as the result of processing one item being dependent upon another, can cause unpredictable results. This is because source items will not usually be processed in their original order. A second limitation is that PLINQ only provides parallelism for in-memory data, such as collections or pre-loaded XML. Other data sources will be processed sequentially. For example, LINQ to SQL generates SQL statements that are passed to SQL Server and the results returned. However, further processing of those returned results can be parallelised with PLINQ.
AsParallel
To show how a LINQ query can be modified to be processed in parallel we first need a sequential version. The code below shows a very simple query. Here we start with an array containing the integers from one to ten. Using the Select standard query operator we project this to a new sequence containing the squares of the original values. As LINQ uses deferred execution, the new sequence is not generated until the data is accessed. This means that the foreach loop causes the query to be evaluated and outputs the results.
int[] sequence = Enumerable.Range(1, 10).ToArray();
var squares = sequence.Select(x => x * x);
foreach (var square in squares)
{
Console.Write(square + " ");
}
/* OUTPUT
1 4 9 16 25 36 49 64 81 100
*/
LINQ works with sequences that implement the IEnumerable<T> interface. To signify that we wish to use PLINQ, we must ensure that the source sequence supports parallelism. To do so, we can use the static AsParallel method of the ParallelEnumerable class. This is an extension method of IEnumerable<T>, so can be applied to any sequence that supports LINQ operations. It returns an object of the type ParallelQuery<T>.
Once you have the parallel data sequence you can use it as the source for LINQ operations as you would any other sequence. The execution of queries is still deferred and the individual results are the same. However, in the background PLINQ decomposes the data in a manner that allows efficient parallel processing.
To parallelise the pervious query, add the call to AsParallel as shown below:
var squares = sequence.AsParallel().Select(x => x * x);
// 1 36 4 49 9 64 16 81 25 100
29 November 2011