BlackWaspTM

This web site uses cookies. By using the site you accept the cookie policy.This message is for compliance with the UK ICO law.

LINQ
.NET 3.5+

A LINQ Style Median Operator

Language Integrated Query (LINQ) includes an operator that calculates the average value of a sequence containing numerical data. The Average method calculates the mean of the sequence. This article describes an operator that determines the median.

Mean and Median

The LINQ Average operator can be used to calculate the arithmetic mean of a sequence of numbers or a sequence from which numbers can be extracted or calculated. The mean works by totalling the values and dividing by the number of items in the group. Using the mean is often appropriate but in some cases the result can be skewed by untypically high or low values. In these cases, when trying to obtain a typical value for the collection, you may decide to use an alternative statistic, such as the median.

The median is the number that falls in the middle of the data when the set is sorted. If the sequence has an even number of values, the median is the mean of the two central values. The median can be useful when finding a typical value in a set that contains extremes. For example, consider a company with five employees, four of which earn £10,000 and one that earns £1,000,000. The mean salary is £208,000, which is not typical for new starters. The median salary is £10,000, which is typical.

Average Operator

In this article we will create an extension method that calculates the median of a collection of values using a similar syntax to the standard Average operator. The Average method has many overloaded versions, allowing you to work with integers, floating-point values, nullable numeric values and other types that can be converted to a numeric value with a selector function. In the article I will describe four overloads that all work with decimal values. These are duplicated for the other numeric data types in the downloadable sample code.

Creating the Class

To begin we need a static class for the extension method. Create a new project and add a class named "MedianExtensions". Modify the class' definition as follows:

public static class MedianExtensions
{
}

Calculating the Median for a Sequence of Decimals

The first method will calculate the median for a sequence of decimal values and will return a decimal result. To mirror the functionality of the Average operator, the method must process any series that implements the IEnumerable<decimal> interface. The definition for the method is therefore:

public static decimal Median(this IEnumerable<decimal> source)
{
}

The first step inside the method is to obtain the number of items in the sequence. For this we can use the Count operator. In the equivalent Average method, trying to process an empty sequence causes an InvalidOperationException. We can duplicate this with an if statement, as follows:

int decimals = source.Count();
if (decimals != 0)
{
}
else
{
    throw new InvalidOperationException("Sequence contains no elements");
}

If there are elements in the series we can now determine the median for sequences with an odd number of items. First we calculate the midpoint of the sequence by halving the number of items, subtracting one as the sequence will be zero-based. With the items sorted with the OrderBy method we can extract the item at the midpoint using LINQ's ElementAt operator. To do this, add the following code within the if statement's empty code block:

var midpoint = (decimals - 1) / 2;
var sorted = source.OrderBy(n => n);
var median = sorted.ElementAt(midpoint);

If the sequence has an even number of items we also need to extract the item following the midpoint and find the mean of the two values. This is achieved with a second if statement:

if (decimals % 2 == 0)
{
    median = (median + sorted.ElementAt(midpoint + 1)) / 2;
}

return median;

Testing the Method

We can test the Median method by creating several arrays of decimals and calculating the median for each:

var median1 = new decimal[] { 1, 2, 3, 4, 5 }.Median();     // 3
var median2 = new decimal[] { 1, 2, 3, 4, 5, 6 }.Median();  // 3.5
var median3 = new decimal[] { 3, 5, 1, 4, 2, 6 }.Median();  // 3.5
var median4 = new decimal[] { }.Median();                   // Exception

Calculating the Median for a Sequence of Nullable Decimals

The Average operator allows you to calculate the mean for a collection of nullable values. This overloaded version treats null values as if they were not present in the sequence; the mean of the values 1, 2 and null is returned as 1.5. Unlike the version that processes a sequence of decimals, the nullable version does allow you to average an empty sequence. For empty sequences or those that contain only nulls the result is always null. We will use the same rules for the Median method.

We can utilise the existing Median method to perform the calculation. The new method simply performs some preparation. First, a sequence of all of the non-null values is generated using the Where operator. If this sequence is empty we return null. If not, we use the original Median method to generate a result.

public static decimal? Median(this IEnumerable<decimal?> source)
{
    var withoutNull = source.Where(n => n != null).Cast<decimal>();
    return withoutNull.Count() != 0 ? withoutNull.Median() : (decimal?)null;
}

Testing the Method

We can test the Median method by creating collections of nullable decimals and calculating their medians:

var median1 = new decimal?[] { 1, 2, 3, 4, 5 }.Median();    // 3
var median2 = new decimal?[] { 1, 2, 3, 4, 5, 6 }.Median(); // 3.5
var median3 = new decimal?[] { 3, 5, 1, 4, 2, 6 }.Median(); // 2.5
var median4 = new decimal?[] { }.Median();                  // null
var median5 = new decimal?[] { null, null }.Median();       // null
12 March 2011