
.NET 3.5+A LINQ Style Median Operator
Language Integrated Query (LINQ) includes an operator that calculates the average value of a sequence containing numerical data. The Average method calculates the mean of the sequence. This article describes an operator that determines the median.
Mean and Median
The LINQ Average operator can be used to calculate the arithmetic mean of a sequence of numbers or a sequence from which numbers can be extracted or calculated. The mean works by totalling the values and dividing by the number of items in the group. Using the mean is often appropriate but in some cases the result can be skewed by untypically high or low values. In these cases, when trying to obtain a typical value for the collection, you may decide to use an alternative statistic, such as the median.
The median is the number that falls in the middle of the data when the set is sorted. If the sequence has an even number of values, the median is the mean of the two central values. The median can be useful when finding a typical value in a set that contains extremes. For example, consider a company with five employees, four of which earn £10,000 and one that earns £1,000,000. The mean salary is £208,000, which is not typical for new starters. The median salary is £10,000, which is typical.
Average Operator
In this article we will create an extension method that calculates the median of a collection of values using a similar syntax to the standard Average operator. The Average method has many overloaded versions, allowing you to work with integers, floating-point values, nullable numeric values and other types that can be converted to a numeric value with a selector function. In the article I will describe four overloads that all work with decimal values. These are duplicated for the other numeric data types in the downloadable sample code.
Creating the Class
To begin we need a static class for the extension method. Create a new project and add a class named "MedianExtensions". Modify the class' definition as follows:
public static class MedianExtensions
{
}
Calculating the Median for a Sequence of Decimals
The first method will calculate the median for a sequence of decimal values and will return a decimal result. To mirror the functionality of the Average operator, the method must process any series that implements the IEnumerable<decimal> interface. The definition for the method is therefore:
public static decimal Median(this IEnumerable<decimal> source)
{
}
The first step inside the method is to obtain the number of items in the sequence. For this we can use the Count operator. In the equivalent Average method, trying to process an empty sequence causes an InvalidOperationException. We can duplicate this with an if statement, as follows:
int decimals = source.Count();
if (decimals != 0)
{
}
else
{
throw new InvalidOperationException("Sequence contains no elements");
}
If there are elements in the series we can now determine the median for sequences with an odd number of items. First we calculate the midpoint of the sequence by halving the number of items, subtracting one as the sequence will be zero-based. With the items sorted with the OrderBy method we can extract the item at the midpoint using LINQ's ElementAt operator. To do this, add the following code within the if statement's empty code block:
var midpoint = (decimals - 1) / 2;
var sorted = source.OrderBy(n => n);
var median = sorted.ElementAt(midpoint);
If the sequence has an even number of items we also need to extract the item following the midpoint and find the mean of the two values. This is achieved with a second if statement:
if (decimals % 2 == 0)
{
median = (median + sorted.ElementAt(midpoint + 1)) / 2;
}
return median;
12 March 2011