BlackWaspTM

This web site uses cookies. By using the site you accept the cookie policy.This message is for compliance with the UK ICO law.

Parallel and Asynchronous
.NET 4.0+

Parallel LINQ

The fifteenth and final part of the Parallel Programming in .NET tutorial leaves the examination of imperative programming using loops and tasks. It starts the description of declarative programming using Parallel Language-Integrated Query.

Language-Integrated Query

Language-Integrated Query (LINQ) provides a declarative model that permits querying of sequences of data, such as in-memory collections, XML documents and database data. Unlike the imperative code we have seen so far in this tutorial, when you use LINQ you are concerned with what you are trying to achieve, rather than the mechanics of how you achieve it; LINQ hides the implementation details of looping and conditional statements, allowing querying with lambda expressions, standard query operators and a new query syntax.

The nature of many queries means that they can be easily parallelised. Most queries perform the same group of actions for each item in a collection. If all of those actions are independent, with no side effects caused due to the order in which they appear, you can often achieve a large performance increase by dividing the work between several processor cores. To support these scenarios, the .NET framework version 4.0 introduced Parallel LINQ (PLINQ).

PLINQ provides the same standard query operators and query expression syntax as LINQ. The key difference is that the source data can be broken into sections using data decomposition. These smaller data groups of data are then potentially processed by all of the available CPU cores. As we will see, converting a LINQ query to its parallel counterpart requires a trivial change.

PLINQ does have some limitations that mean it is not a direct replacement for LINQ and cannot be the default option for querying. Key amongst the limitations are that side-effects of processing individual items from source sequences, such as the result of processing one item being dependent upon another, can cause unpredictable results. This is because source items will not usually be processed in their original order. A second limitation is that PLINQ only provides parallelism for in-memory data, such as collections or pre-loaded XML. Other data sources will be processed sequentially. For example, LINQ to SQL generates SQL statements that are passed to SQL Server and the results returned. However, further processing of those returned results can be parallelised with PLINQ.

AsParallel

To show how a LINQ query can be modified to be processed in parallel we first need a sequential version. The code below shows a very simple query. Here we start with an array containing the integers from one to ten. Using the Select standard query operator we project this to a new sequence containing the squares of the original values. As LINQ uses deferred execution, the new sequence is not generated until the data is accessed. This means that the foreach loop causes the query to be evaluated and outputs the results.

int[] sequence = Enumerable.Range(1, 10).ToArray();

var squares = sequence.Select(x => x * x);

foreach (var square in squares)
{
    Console.Write(square + " ");
}

/* OUTPUT

1 4 9 16 25 36 49 64 81 100

*/

LINQ works with sequences that implement the IEnumerable<T> interface. To signify that we wish to use PLINQ, we must ensure that the source sequence supports parallelism. To do so, we can use the static AsParallel method of the ParallelEnumerable class. This is an extension method of IEnumerable<T>, so can be applied to any sequence that supports LINQ operations. It returns an object of the type ParallelQuery<T>.

Once you have the parallel data sequence you can use it as the source for LINQ operations as you would any other sequence. The execution of queries is still deferred and the individual results are the same. However, in the background PLINQ decomposes the data in a manner that allows efficient parallel processing.

To parallelise the pervious query, add the call to AsParallel as shown below:

var squares = sequence.AsParallel().Select(x => x * x);

// 1 36 4 49 9 64 16 81 25 100

Order Preservation

You may have noticed that the individual results of the previous query were correct but that they appeared in a different order than when using the sequential version of LINQ. This is a by-product of the data decomposition. In the above results, the first two results were provided by two different processor cores. To achieve the best performance, the results were added to the generated sequence in the order that they were produced.

In some cases the ordering of results does not matter, especially if the data is later sorted using the OrderBy standard query operator or by some imperative method. In other cases a corruption of the original order can be disastrous. For example, if you are using the partitioning operators to page through data you may find that the same page contains different results for each execution of a query. Some items may appear to be duplicated on several pages whilst others don't appear. In such situations you will want to preserve the ordering of results to match those of the input sequence.

PLINQ does support order preservation of a parallel data source using the AsOrdered method. You should only use this method when it is essential to maintain the order of results, as it can significantly lower the performance of your queries.

The following query uses the AsOrdered method after AsParallel to preserve the order of the results.

var squares = sequence.AsParallel().AsOrdered().Select(x => x * x);

// 1 4 9 16 25 36 49 64 81 100
29 November 2011