BlackWaspTM

This web site uses cookies. By using the site you accept the cookie policy.This message is for compliance with the UK ICO law.

Parallel and Asynchronous
.NET 4.0+

Synchronisation in Parallel Loops

The fifth part of the Parallel Programming in .NET tutorial continues the description of parallel loops. This article looks at the synchronisation problems of aggregation in parallel loops and how to overcome them with locking and local state values.

Synchronisation Using Locking

The simplest way to add synchronisation to the loop is by creating a lock that prevents concurrent access to the aggregation value. Locking in C# is achieved with the lock statement and an object that controls the locking as its only parameter. The code to execute whilst retaining the lock is added to the lock statement's code block. When another thread reaches a lock statement that is based upon the same underlying object, that thread is blocked until the first lock's code block is exited.

The inconsistency that we see in the aggregation in the parallel loop is caused by two or more threads reading the value, updating their local copies and storing the new value again. In some cases one thread's change to the aggregation value is overwritten by another, leading to the low result. We can prevent this by adding locking around the command that reads and modifies the total variable. NB: You should apply locking to the minimum amount of code to reduce unnecessary blocking.

The updated version of the loop with locking added is shown below:

object sync = new object();
long total = 0;

Parallel.ForEach(Enumerable.Range(1, 100000000), value =>
{
    lock (sync)
    {
        total += value;
    }
});

// total = 5000000050000000

When you run the code with locking the result of the aggregation is correct. However, the performance of the loop is substantially slower. This is because the entire body of the loop is within the lock so processor cores are being blocked for much of the process.

Local Loop State

Some algorithms require that you use locking as in the previous example, severely decreasing the benefit of parallel processing. However, in many cases you can adjust your algorithm to reduce the blocking and increase the benefits of parallelism. Summing a series of values in one of these cases, as you can decompose the data into sections that can be individually summed and then combine these results at the end of the process to produce the final figure. If each section is processed sequentially, no locking is required during the loop's iterations.

The parallel For and ForEach loops provide a mechanism for this type of decomposition known as thread-local data. You can create a variable that is local to a single thread and that is available for each iteration of the loop for one decomposed section of data. This value is initialised when the section of data is first accessed. Each iteration of the loop modifies the thread-local value and when the entire subset of data has been processed, you can perform a final action using the result.

To include thread-local data with the ForEach method we can use an overload that has four parameters. The first is the collection that is to be enumerated. The second parameter accepts a Func delegate that returns the initial value for the local variable. In our summing example we will return a 64-bit integer value of zero using the following lambda expression:

() => 0L;

The third parameter is a Func delegate that represents the body of the loop. The delegate has parameters that provide the next item in the collection, a ParallelLoopState object and the current value of the thread-local variable. During each iteration the local variable is updated and the new value returned by the delegate, as shown below. Note that no locking is required as the items in each section of data are processed sequentially.

(value, pls, localTotal) => { return localTotal += value; }

The fourth parameter is an Action delegate. This is called when a decomposed section of the collection has been fully processed. The delegate receives the final value of the thread-local data. In our summing example this value is added to the overall total. You will see that this final action is performed within a lock statement as it is possible that several threads could run the final stage simultaneously and cause synchronisation errors. Locking prevents these errors and is less likely to cause blocking problems as the final action is performed less frequently than in the previous example.

localTotal => { lock (sync) { total += localTotal; } }

The final code for the loop with local thread state is shown below.

object sync = new object();
long total = 0;

Parallel.ForEach(Enumerable.Range(1, 100000000),
    () => 0L,
    (value, pls, localTotal) =>
    {
        return localTotal += value;
    },
    localTotal =>
    {
        lock (sync)
        {
            total += localTotal;
        }
    });

// total = 5000000050000000
1 September 2011