.NET 3.5+

Performing Cross Joins with LINQ to Objects

by Richard Carr, published at http://www.blackwasp.co.uk/LinqCrossJoins.aspx

A cross join, also known as a Cartesian product, joins two sequences of values to create a new collection where every possible combined pair is represented. When using Language-Integrated Query (LINQ), cross joins can replace nested loops.

Cross Joins

In previous LINQ articles we've seen two types of join. Inner joins are performed using the Join standard query operator, or the join keyword when using query expression syntax. Inner joins find all items in one sequence that match those in a second collection, based upon key selectors. Every combination of items with matching keys is projected into a new collection. This means that the items from each collection may be represented once, multiple times or never in the results.

LINQ also supports left outer joins. With such a join, every item in the first collection will be present in the results, even if there is no corresponding key in the second sequence. It's still possible that items in the first collection are used to produce multiple results, where the key selection generates more than one match.

Another type of join is the cross join, or Cartesian product. When combining two sequences using this process, every item in the first collection is combined with every item in the second. No key selection is required as there is no filtering of data. The resultant sequence will always have a number of items equal to the product of the sizes of the two source sequences.

Query Expression Syntax

Performing a cross join with LINQ's query expression syntax is simply a case of including two from clauses - one for each source sequence. You then add a projection using the select keyword. The code remains concise whilst the intent is very clear, giving good readability and maintainability.

The following sample code demonstrates a cross join query. Here we have eight letters, representing the columns of a chess board. The second collection has eight digits for the board's rows. When we combine them with a cross join, we generate sixty-four co-ordinates; one for each of the chess board's squares.

char[] letters = "ABCDEFGH".ToCharArray();
char[] digits = "12345678".ToCharArray();

var coords =
    from l in letters
    from d in digits
    select l.ToString() + d;

foreach (var coord in coords)
{
    Console.Write("{0} ", coord);
    if (coord.EndsWith("8"))
    {
        Console.WriteLine();
    }
}

/* OUTPUT
            
A1 A2 A3 A4 A5 A6 A7 A8
B1 B2 B3 B4 B5 B6 B7 B8
C1 C2 C3 C4 C5 C6 C7 C8
D1 D2 D3 D4 D5 D6 D7 D8
E1 E2 E3 E4 E5 E6 E7 E8
F1 F2 F3 F4 F5 F6 F7 F8
G1 G2 G3 G4 G5 G6 G7 G8
H1 H2 H3 H4 H5 H6 H7 H8
            
*/

You can see from the results in the comment that the first sequence in the query is processed once, whereas the second sequence, containing the numeric digits, is repeated for each letter. It is important to know that the second sequence will be enumerated once for each item in the first. If the second sequence is of a type that may only be read once, the process will fail. In such situations you should materialise the sequence first by calling ToArray, ToList or a similar method.

Using Standard Query Operators

To achieve the same results using standard query operators you can use the SelectMany operator. You would normally use this extension method to perform one-to-many project to flatten a sequence. For example, you might have a collection of team members, with each object in the sequence containing a list of skills that the corresponding person possesses. With SelectMany, you could create a new sequence containing the combined skills of all of the team.

To perform a cross join you use SelectMany against the first sequence. Where you would normally use a lambda expression to specify which child items to retrieve, you instead provide a lambda that selects the items from the second sequence. This means that all items in the second collection are selected once for every item in the first sequence. You can then project the results to combine the details from the two lists.

The following single line of code produces the same results as the earlier query. The first lambda expression obtains the digits once for each letter. The second function combines the letter and digit for each of the sixty-four results. Although this produces the same results, you could argue that it is less readable than the equivalent query expression syntax.

var coords = letters.SelectMany(l => digits, (l, d) => l.ToString() + d);

Non-LINQ Equivalent

Either of the two LINQ-based approaches gives a declarative way to perform a cross join. If you prefer to use an imperative approach, specifying exactly how the results are generated rather than leaving this to LINQ, you could use nested foreach loops, as shown below:

List<string> coords = new List<string>();
foreach (char letter in letters)
{
    foreach (char digit in digits)
    {
        coords.Add(letter.ToString() + digit);
    }
}

25 August 2013