Sometimes it is necessary to randomise the order of a sequence of values. The Fisher-Yates algorithm provides a paper-based method, which was later computerised by Richard Durstenfeld. This article implements the algorithm as a custom LINQ operator.
Occasionally you might need to change the order of the items in a sequence in order to randomise the elements. A classic example is where you have objects that represent a deck of playing cards that must be shuffled before starting a new game. There are many ways in which you can shuffle a collection, though some provide weighted results that are not ideal if you need true randomness. This is especially true for the playing card example; a poor shuffling algorithm may allow players to cheat.
One way to randomise the order of a sequence of items is using the Fisher-Yates Shuffle algorithm. This was published by Ronald A Fisher and Frank Yates in 1938. The iterative algorithm describes a method for randomising the order of a simple increasing set of numbers. It is followed using a pencil and paper. The steps are:
- Write out the list of numbers to be shuffled.
- Pick a random number between one and the number of numbers that have not been crossed out. For the first pass no numbers will have been crossed out. Fisher and Yates described methods for selecting a random number. These are key to the algorithm's use but are unimportant for the purposes of this article.
- Count through the numbers that are not crossed out to find the item that appears at the position defined by the random number. For the first pass, write this number on a separate piece of paper. For subsequent iterations, write it next to the previously copied number. This will produce the shuffled list.
- Repeat the process from step 2 until all numbers in the original list have been crossed out.
We can demonstrate the algorithm by shuffling the numbers between one and five. We start with our new list fully populated and our shuffled list empty:
Original: 1, 2, 3, 4, 5
We now pick a random number between one and five, as there are five items in the list. This gives an equal probability for any of the original items becoming the first item in the shuffled list. Let's imagine that the first random number selected is 2. We must take the second item and copy it into the shuffled list. It must then be crossed out.
2, 3, 4, 5
The next random number must be between one and four, as there are only four items that remain unused in the original list. If we generate a three, we must take the third item from the list, ignoring any crossed out numbers. This is the number 4.
2, 3, 4, 5
Shuffled: 2, 4
We now need a number between one and three. Let's assume we generate the number 2. The second item that has yet to be used is 3:
2, 3, 4, 5
Shuffled: 2, 4, 3
We now have only two items remaining so need only one random number to decide which is next and which will be last. Assuming we randomly pick the number 1, the final results are:
1, 2, 3, 4, 5
Shuffled: 2, 4, 3, 1, 5
In 1964, Richard Durstenfeld published an algorithm that was based upon the Fisher-Yates method but designed for computer use. The algorithm shuffles an array of any type. The steps for this algorithm are:
- Determine the length of the array (len).
- Loop through each of the values between len and one, decrementing the loop control variable (lcv) for each iteration.
- Randomly select a value (n) between one and the current lcv.
- Swap the contents of the array elements at indexes n and lcv. This moves the randomly selected element beyond the indexes that will be considered in the next iteration. NB: This assumes that the first item in the array appears at index 1, rather than being zero-based, as in C#.
- Continue the loop, restarting the randomisation process from step 3 with a lower lcv.
When the Durstenfeld algorithm is used the results are the same as with Fisher-Yates. The array is modified directly. This gives the advantage that a new array, with additional memory requirements, is not generated. However, it means that the original order is no longer available. If you need to keep the original sequence you need to create a copy to shuffle.
In this article we'll create an extension method that shuffles a sequence using an algorithm based upon the Durstenfeld implementation. However, it will generate a new, shuffled copy of the source sequence, leaving the original collection unchanged. The method will work in a similar manner to the standard operators of Language-Integrated Query (LINQ). It will process any sequence that implements IEnumerable<T> and return results using the same interface. As with LINQ operators, the method will validate the input immediately but use deferred execution for the shuffling process.
NB: In the source code, which you can download using the link at the start of the article, we use the Random class to generate the random numbers. For most scenarios the pseudo-random numbers are sufficient. In cases where true randomness is required you should use an alternative approach, such as employing a hardware random number generator.
2 January 2013