BlackWaspTM

This web site uses cookies. By using the site you accept the cookie policy.This message is for compliance with the UK ICO law.

LINQ
.NET 3.5+

A LINQ Style Partition Operator

Sometimes a sequence of values needs to be split into partitions, batches or chunks. This article describes an extension method that performs such an operation, accepting a collection of any type and returning a set of sequences from its contents.

Partitioning

Language-Integrated Query (LINQ) provides several standard query operators that allow you to partition a sequence. These include Take and Skip, which split a sequence into two parts and return a new series containing the elements from one of those parts.

In this article we'll create a new partitioning operator that will allow you to break a sequence into a number of parts and return all of these parts as a new sequence of sequences. The operator will generate sequences that are all of a specified size, except for the final set, which may be shorter. The elements in the sequences will appear in the same order as in the source collection.

For example, the numbers from one to ten may be partitioned into sequences of three items. The first returned sequence would contain the values from one to three. The second would hold the numbers four to six. The third sequence would hold seven to nine and the final sequence would contain just the ten.

Creating the Class

We'll be creating the Partition operator as an extension method so that it behaves in a similar manner to the standard query operators. For this we need a static class:

public static class PartitionExtensions
{
}

Creating the Extension Method

The new operator should work with any sequence that implements IEnumerable<T>, so the first parameter accepts this data type and is decorated with the this keyword. The only other argument will be used to specify the number of items in each of the resultant partition sequences. The method returns a sequence of sequences, each of the IEnumerable<T> type, so the return type is IEnumerable<IEnumerable<T>>.

To allow us to validate the parameters immediately that calls to Partition are encountered, the method checks that the sequence is not null and that the provided size is sensible. It then calls another private method, PartitionImpl.

public static IEnumerable<IEnumerable<T>> Partition<T>(
        this IEnumerable<T> sequence, int size)
{
    if (sequence == null) throw new ArgumentNullException("sequence");
    if (size <= 0) throw new ArgumentException("Partition size must be one or greater");

    return PartitionImpl(sequence, size);
}

Adding the Iterator Method

We can now add PartitionImpl. This iterator method breaks the source sequence into partitions and provides deferred execution; each sub-sequence will only be generated as it is read from the results.

Start by adding the signature for the method. It is identical to the signature of Partition, except for the name and the absence of the this keyword, as we do not want to create another extension method.

private static IEnumerable<IEnumerable<T>> PartitionImpl<T>(
    IEnumerable<T> sequence, int size)
{
}

We'll build the partitions in an array of the correct size. This array will act as a buffer that fills up as we enumerate the source data. Once full, the buffer will be returned. To create the array and a pointer that will be used whilst filling the buffer, add the following variables to the PartitionImpl method. Note that we start with the partition buffer set to null. We'll be creating the buffer arrays later.

T[] partition = null;
int pointer = 0;

We can now perform the enumeration of the source sequence. In the code below you can see that we start by getting an enumerator and then use a while loop to process all of the items from the sequence. Within the loop we create a new buffer array for the next partition's items, if no array is yet set. We'll create a new buffer each time the old one gets full.

The next three lines of code copy an element of the source data into the correct position in the buffer before advancing the pointer. If the pointer reaches the maximum size for a partition it is reset by the modulus operator.

The if statement detects a pointer value of zero, which means that the buffer is full. When full, the buffer is yielded and the partition variable is set to null. If there is a subsequent iteration of the loop, a new buffer will be created. We can't re-use the array reference as this will affect the overall results.

var enumerator = sequence.GetEnumerator();

while (enumerator.MoveNext())
{
    if (partition == null)
    {
        partition = new T[size];
    }

    partition[pointer] = enumerator.Current;
    pointer++;
    pointer %= size;

    if (pointer == 0)
    {
        yield return partition;
        partition = null;
    }
}

At the end of the while loop it is possible that we have a partially filled buffer array that has yet to be returned. This should become the final partition in the overall results. To complete the PartitionImpl method we therefore check for a null buffer array. If the array is not null, we need to return the number of items that have been added to it so far. We don't return the full array, which will end with default values for the generic type. We can use the Take standard query operator with the current pointer value for this purpose.

if (partition != null)
{
    yield return partition.Take(pointer);
}

Testing the Method

To test the method let's create a sequence of integers and break it into partitions. The following code uses Enumerable.Range to generate an ascending range of ten numbers. These are split into partitions of three items, with the final collection containing a single item. Try executing the code and trying different source sequences and partition sizes.

var values = Enumerable.Range(1, 10);

var partitions = values.Partition(3).ToList();

Console.WriteLine("{0} partition(s)", partitions.Count);

foreach (var partition in partitions)
{
    Console.Write("{");
    foreach (var value in partition)
    {
        Console.Write(" {0}", value);
    }
    Console.WriteLine(" }");
}

/* OUTPUT

4 partition(s)
{ 1 2 3 }
{ 4 5 6 }
{ 7 8 9 }
{ 10 }

*/
6 February 2013