 .NET 3.5+LINQ Grouping
The sixth part of the LINQ to Objects tutorial examines grouping using the GroupBy standard query operator and equivalent query expression syntax. These allow a collection to be divided into smaller collections, each of which shares a common key.
Grouping Data
LINQ provides the ability to organise information into groups. Using either the standard query operators or query expression syntax, you can specify a key based upon the data held in a collection. The source data is then segregated into several enumerable lists, each containing all of the items with a matching key. For example, you may group a collection of stock items by their categories. The result is a group of collections, one for each unique category, each containing all of the products in that category.
Grouping of data has many uses. You may decide to group a large data set and display one group at a time through the user interface. The user may be able to change the visible group using a combo box or selection of radio buttons. You may also group the information so that you can aggregate the data, obtaining sums, averages or other aggregations for each group.
GroupBy Standard Query Operator
We will begin by examining the GroupBy standard query operator. This is an extension method of the IEnumerable<T> interface that performs grouping. Before we can begin, we need a class to work with. This will be the same StockItem type that we used in the LINQ Results Ordering article:
public class StockItem
{
public string Name { get; set; }
public string Category { get; set; }
public double Price { get; set; }
public StockItem(string name, string category, double price)
{
Name = name;
Category = category;
Price = price;
}
public override string ToString()
{
return string.Format("{0}/{1}/{2}", Name, Category, Price);
}
}
For each example we will use the same stock item source data. To initialise the collection use the following code:
var stock = new List<StockItem>
{
new StockItem("Apple", "Fruit", 0.30),
new StockItem("Banana", "Fruit", 0.35),
new StockItem("Orange", "Fruit", 0.29),
new StockItem("Cabbage", "Vegetable", 0.49),
new StockItem("Carrot", "Vegetable", 0.29),
new StockItem("Lettuce", "Vegetable", 0.30),
new StockItem("Milk", "Dairy", 1.12)
};
Simple Grouping
The simplest version of the GroupBy method accepts a single parameter containing a Func delegate, usually a lambda expression. This key selector expression is used to extract a key value from each item in the source collection. All items with a matching key are placed into one of the group collections that are returned when the query is executed.
The results of the grouping query are returned as an IEnumerable collection of IGrouping objects. An IGrouping object is simply a collection of items that have the same key. IGrouping<T> implements IEnumerable<T> and adds a new Key property that contains the shared key value.
In the sample code below, the key selector function causes grouping according to the Category property. In many cases, grouping is performed upon a single property or the results of a simple expression. However, you can create grouping based upon multiple properties by using a key selector that returns a combination of properties in an anonymous type. For example, you could group by Category and Name using the selector, s => new { s.Category, s.Name }.
var groups = stock.GroupBy(s => s.Category);
As the result of the query is a collection of collections, we can use two nested foreach loops to output the data. Below is an outer loop that displays the Key property for each group, and an inner loop that shows an indented list of the individual items.
foreach (var group in groups)
{
Console.WriteLine(group.Key);
foreach (var product in group)
{
Console.WriteLine(" {0}", product);
}
}
You can now run the program to execute the query and show the results. The output should be as follows:
Fruit
Apple/Fruit/0.3
Banana/Fruit/0.35
Orange/Fruit/0.29
Vegetable
Cabbage/Vegetable/0.49
Carrot/Vegetable/0.29
Lettuce/Vegetable/0.3
Dairy
Milk/Dairy/1.12
Using Comparers
Unless otherwise specified, grouping uses the default comparer for the results of the key selector delegate. If there are slight differences between two keys, the items will be placed into separate groups. To see this, modify the sample data so that the "Banana" item has the category, "fruit", instead of "Fruit". If you run the program again you will see that the banana object is placed in its own group.
You can change the comparer that is used to determine the grouping. The comparer can be a standard class from the .NET framework or a custom comparer. The only restriction is that it must implement the EqualityComparer<T> interface.
To demonstrate, try changing the query to the following code. This uses a case-insensitive string comparer so that stock items with categories that differ only in case are combined into the same group. After testing the code you should return the banana's category to "Fruit".
var groups = stock.GroupBy(s => s.Category, StringComparer.OrdinalIgnoreCase);
Controlling Projection
As with other, non-grouping queries you can control the projection of the items that are placed into the groups. This is achieved using an element selector, which is a second delegate that returns the desired information. In the following code the first lambda expression is the key selector and groups items by category. The second lambda extracts the Name property from each item.
var groups = stock.GroupBy(s => s.Category, p => p.Name);
/* OUTPUT
Fruit
Apple
Banana
Orange
Vegetable
Cabbage
Carrot
Lettuce
Dairy
Milk
*/
Using Result Selectors
For the final example of the GroupBy extension method we will add a result selector. A result selector is another delegate that allows the structure of the groups to be controlled. The result selector delegate accepts two arguments. The first is the key of a single group and the second is the list of items that are to be placed into the group. The delegate should return an object that represents the group, optionally including the items in the group.
It is easier to understand the use of a result selector with an example. Consider the code in the shaded box below. In this case three lambda expressions are being passed to the GroupBy method. The first is the key selector that specifies that the items will be grouped according to their Category property values. The second is the element selector that projects each item as a string containing the product name. The third is the result selector.
In the example, the result selector builds a new object of an anonymous type for each group. The two parameters of the lambda are category and items. For each group processed the category parameter will receive the key, which will be the category name as returned by the key selector. The items parameter will receive all of the items within the group, each being a string containing the Name property value, as defined in the element selector. The anonymous type value is constructed from the category (CategoryName), the number of items (ItemCount) and the items themselves (Items).
Try executing the code to see the results. Note that the foreach loops have been modified slightly because they are no longer working with IGrouping<T> objects. These have been replaced by the result selector's anonymous types.
var groups = stock.GroupBy(
c => c.Category,
s => s.Name,
(category, items) => new
{
CategoryName = category,
ItemCount = items.Count(),
Items = items
});
foreach (var group in groups)
{
Console.WriteLine("{0}, {1} item(s)", group.CategoryName, group.ItemCount);
foreach (var product in group.Items)
{
Console.WriteLine(" {0}", product);
}
}
/* OUTPUT
Fruit, 3 item(s)
Apple
Banana
Orange
Vegetable, 3 item(s)
Cabbage
Carrot
Lettuce
Dairy, 1 item(s)
Milk
*/
NB: There are several overloaded versions of the GroupBy clause to explore. They allow the use of comparers, key selectors, element selectors and results selectors in various combinations.
Grouping with Query Expression Syntax
In the remaining sections we will recreate some of the examples from above using query expression syntax. Not all of the samples can be converted to queries, as it is not possible to specify comparers.
Simple Grouping
The first example simply created a set of collections of stock items, grouped according to their category. We can recreate this with query expression syntax using the group clause. The clause has two parts. After the group keyword you specify the items that are to be grouped by providing the appropriate range variable. This is followed by the by keyword and an expression that defines the grouping. The sample can be recreated as follows:
var groups =
from s in stock
group s by s.Category;
Controlling Projection
The next example defines the projection of the items within the groups. To control the projection you can change the range variable after the group keyword to an expression. The following example returns only the Name property value for each item in the groups.
var groups =
from s in stock
group s.Name by s.Category;
into Clause
The into clause can be used with various operations, including selects, joins and groups. The clause causes the creation of a temporary variable for use within the query. The variable stores the results of a query and can itself be queried further. For this reason, it is sometimes called a continuation.
We can use a continuation to recreate the GroupBy syntax that uses a result selector. You can think of the query below as being in two sections. The first part groups the stock items by the category name and puts the generated IGrouping<T> objects into a temporary collection named, "category". The second part projects the items in the category variable into a collection of anonymous type objects containing the category name, item count and the items themselves.
var groups =
from s in stock
group s by s.Category into category
select new
{
CategoryName = category.Key,
ItemCount = category.Count(),
Items = category
};
|