BlackWaspTM

This web site uses cookies. By using the site you accept the cookie policy.This message is for compliance with the UK ICO law.

LINQ
.NET 3.5+

A LINQ Style Median Operator

Language Integrated Query (LINQ) includes an operator that calculates the average value of a sequence containing numerical data. The Average method calculates the mean of the sequence. This article describes an operator that determines the median.

Calculating the Median for a Sequence of Any Type

The last two overloaded versions of the Average method that work with decimals allow you to calculate the mean of a set of values of any type. In addition to providing the sequence to the first parameter you also pass a selector delegate, usually in the form of a lambda expression. The function is executed against each item to generate a decimal result. The method then averages these results.

To replicate this syntax for the Median operator we can create two method signatures. The first of these defines a generic method that operates upon a sequence of any type. The selector function must include a parameter of the generic type and must return a decimal value. The signature for the method is:

public static decimal Median<T>(this IEnumerable<T> source, Func<T, decimal> selector)
{
}

The second version with a selector is almost identical. The difference is that the return value of the method and of the Func delegate are both nullable decimals:

public static decimal? Median<T>(this IEnumerable<T> source, Func<T, decimal?> selector)
{
}

For both methods we can utilise the previously created overloads to calculate the median. The new methods apply the selector function to each item in the sequence using the LINQ Select operator and pass the modified sequence to the appropriate Median method. The code for both new methods is identical:

return source.Select(x => selector(x)).Median();

A Final Problem

There is a problem with the Median method. When the Count method is called to determine the number of items in the source sequence, the sequence is enumerated in full. Later, each time the ElementAt operator is used, the sequence is processed again. This means that every time you call Median, the source sequence will be read two or three times. If the sequence has slow performance, the Median method will be unnecessarily slow. If the sequence can only be read once, returns different results each time or has side effects when read, the Median method may fail, yield an incorrect result or cause unexpected problems.

We can solve this by reading the entire sequence into memory once and working on our cached copy. This increases the memory footprint of the Median call but removes the problems described above. The alternative version of the method is as follows:

public static decimal Median(this IEnumerable<decimal> source)
{
    var cached = source.ToList();               // Cache the sequence
    int decimals = cached.Count();              // Use the cached version
    if (decimals != 0)
    {
        var midpoint = (decimals - 1) / 2;
        var sorted = cached.OrderBy(n => n);    // Use the cached version
        var median = sorted.ElementAt(midpoint);

        if (decimals % 2 == 0)
        {
            median = (median + sorted.ElementAt(midpoint + 1)) / 2;
        }

        return median;
    }
    else
    {
        throw new InvalidOperationException("Sequence contains no elements");
    }
}
12 March 2011