BlackWasp
Algorithms
.NET 1.1

Convert Roman Numerals into Numbers

In an earlier article we converted integer values into Roman numerals, an ancient numbering system that uses letters to represent values. In this article we will reverse the process, allowing Roman numerals to be evaluated as an integer.

Roman Numerals System

The Roman numerals system is described in the earlier article describing how to convert an integer value into Roman numerals. I won't repeat the details in this article, except for the rules for combining the letters of the Roman numerals, as these are required for the algorithm. For easy reference purposes, the Roman numeral letters are as follows:

LetterValue
I1
V5
X10
L50
C100
D500
M1,000

Using combinations of these letters within several defined rules allows the representation of numbers. The rules are described in the following sections.

Rule 1 - Repetition

A single letter may be repeated up to three times consecutively with each occurrence of the value being additive. This means that I is one, II means two and III is three. However, IIII is incorrect for four.

Rule 2 - Additive Combination

Larger numerals must be placed to the left of the smaller numerals to continue the additive combination. So VI equals six and MDCLXI is 1,661.

Rule 3 - Subtractive Combination

A small-value numeral may be placed to the left of a larger value. Where this occurs, for example IX, the smaller numeral is subtracted from the larger. This means that IX is nine and IV is four. The subtracted digit must be at least one tenth of the value of the larger numeral and must be either I, X or C. Accordingly, ninety-nine is not IC but rather XCIX. The XC part represents ninety and the IX adds the nine. In addition, once a value has been subtracted from another, no further numeral or pair may match or exceed the subtracted value. This disallows values such as MCMD or CMC.

Rule 4 - Repeated Use of V, L and D

The numerals that represent numbers beginning with a '5' (V, L and D) may only appear once in each Roman numeral. This rule permits XVI but not VIV.

Rule 5 - Reducing Values

The fourth rule compares the size of value of each the numeral as read from left to right. The value must never increase from one letter to the next. Where there is a subtractive numeral, this rule applies to the combined value of the two numerals involved in the subtraction when compared to the previous letter. This means that XIX is acceptable but XIM and IIV are not.

Rule 6 - Multiplication

To represent numbers of four thousand or greater, lines are added to each letter. For example, a line above a letter multiplies its value by one thousand. To represent 15,015 the Roman numerals are VVVVVV. This rule is not implemented in the algorithm so the code is limited to values up to but not including four thousand.

Rule 7 - Zero

There is very little information that suggests that the system originally had a notation for zero. However, the letter N has been used to represent zero in a text from around 725AD. This will be used in the algorithm.

The Algorithm and the Code

The algorithm and the code for converting a well-formed Roman numeral to an integer is actually very simple to implement. The difficulty of this algorithm is testing that the Roman numeral is indeed well formed and therefore breaks none of the rules described above. The code provided below performs the all of the validation checks before combining the numerals into the final result.

Preparation

Depending upon how you wish to use this algorithm, you may want to create it in a new class, incorporate it in an existing class or simply hide it behind a form event. To keep the code simple and short, this article assumes you have a class of some kind prepared and are adding a new method to that class. The code simply declares a new public method that accepts the string to be converted as its only parameter.

To prepare, create or identify the class that you will add the method to, then add the following declaration code:

// Converts a Roman numerals value into an integer
public int RomanToNumber(string roman)
{

}

NB: If you are creating or adding to a utilities class, you may wish to adjust the declaration to make the method as static.

Holding the Numeral Values

Each of the seven numerals must be associated with its integer equivalent value. As each letter can be thought of as a named integer value, an enumeration is an effective way of declaring the values. Each letter can be named and an integer value associated with it directly by adding the following code to the class.

private enum RomanDigit
{
    I = 1,
    V = 5,
    X = 10,
    L = 50,
    C = 100,
    D = 500,
    M = 1000
}

NB: Remember to mark the enumeration as static if you have created a static method for the RomanToNumber method.

Rule 7

The first rule to added to the class is rule 7. This states that if the entire Roman numeral string contains the letter 'N' then the returned value will be zero. The comparison could be made using the string comparison methods to ensure case-insensitivity. However, as later in the algorithm it will be useful to have an upper case string, we will capitalise the string before testing for 'N'. We will also trim off any excess spaces to reduce the risk of error. These actions can be achieved by adding the following code to the RomanToNumber method:

// Rule 7
roman = roman.ToUpper().Trim();
if (roman == "N") return 0;

Rule 4

Rule 4 states that the letters V, L and D may only appear once in the entire Roman numeral string. A neat way to test for duplicates is to use the Split method to convert the Roman numeral string into an array of items delimited by the numeral character. If there is more than a single instance of the letter, there will be more than two elements in the array. This is tested in an if statement for each letter. If any of the three numerals are repeated then an exception is thrown.

To add rule 4, add the following code to the end of the RomanToNumber method:

// Rule 4
if (roman.Split('V').Length > 2 ||
    roman.Split('L').Length > 2 ||
    roman.Split('D').Length > 2)
    throw new ArgumentException("Rule 4");

Invalid Numerals and Rule 1

The next section of code performs two checks. Firstly the individual letters are checked to ensure that each is a valid numeral. Secondly, rule 1, which states that a numeral may not be repeated consecutively more than three times, is validated.

This is achieved by looping through the string using a foreach loop to extract each character in turn. The test for invalid characters uses the IndexOf method to determine if the character is within a string of allowed numerals and an exception is thrown if it is not. To check for compliance with rule 1 a counter is incremented every time a letter matches the previous character. If the counter reaches four then too many repetitions exists and an exception is thrown. If the two numerals differ, the counter is reset.

To add the invalid numeral test and the rule 1 compliance check, add the following code to the end of the method:

// Rule 1
int count = 1;
char last = 'Z';
foreach (char numeral in roman)
{
    // Valid character?
    if ("IVXLCDM".IndexOf(numeral)==-1)
        throw new ArgumentException("Invalid numeral");

    // Duplicate?
    if (numeral == last)
    {
        count++;
        if (count == 4)
            throw new ArgumentException("Rule 1");
    }
    else
    {
        count = 1;
        last = numeral;
    }
}

NB: The last variable is initialised with a 'Z' as a dummy value as this cannot match the first numeral causing incorrect results.

Digit Conversion and Rule 3

The next section of the code begins the conversion of the Roman numeral into an integer value whilst incorporating rule 3. This rule describes how a smaller-valued numeral before a larger one subtracts its value from the larger digit. The rule also provides several validations.

Let's review the code to be added to the end of the method first, then describe each part in turn.

// Create an ArrayList containing the values
int ptr = 0;
ArrayList values = new ArrayList();
int maxDigit = 1000;
while (ptr < roman.Length)
{
    // Base value of digit
    char numeral = roman[ptr];
    int digit = (int)Enum.Parse(typeof(RomanDigit), numeral.ToString());

    // Rule 3
    if (digit > maxDigit)
        throw new ArgumentException("Rule 3");

    // Next digit
    int nextDigit = 0;
    if (ptr < roman.Length - 1)
    {
        char nextNumeral = roman[ptr + 1];
        nextDigit = (int)Enum.Parse(typeof(RomanDigit), nextNumeral.ToString());

        if (nextDigit > digit)
        {
            if ("IXC".IndexOf(numeral) == -1 ||
                nextDigit > (digit * 10) ||
                roman.Split(numeral).Length > 3)
                throw new ArgumentException("Rule 3");

            maxDigit = digit - 1;
            digit = nextDigit - digit;
            ptr++;
        }
    }

    values.Add(digit);

    // Next digit
    ptr++;
}

The first section creates a simple pointer that will be used to point to each character in turn whilst skipping the second letter pairs that use subtractive combination. It also declares an ArrayList. This collection will contain the value of every digit or subtractive digit pair from the Roman numeral. These values that will be totalled later. An ArrayList has been selected because the required length is initially unknown and because it can later be accessed like an array using its indexer. The ArrayList class is in the System.Collections namespace so add using System.Collections; to the start of the code. Finally, the maximum permitted value of a digit is declared. This will be checked for each digit as the Roman numeral is parsed and reduced as subtractions are processed to permit checking of part of rule 3.

Once the variables are declared, a while loop is started. The loop will continue until the pointer variable has been increased to the point where it is pointing to a character beyond the end of the Roman numeral string.

The next section converts the numeral at the pointer position into an integer value by matching the character against the enumeration declared earlier. This is then tested to ensure it does not exceed the maximum digit value. If a further digit exists in the Roman numeral, this is also converted to an integer. The two values are compared and if the value of the first digit is less than the second then subtractive combination applies.

The next if statement performs three validation checks. Firstly the value being subtracted must be I, X or C. The second test ensures that the value being subtracted is at least one tenth of the value of the second numeral. If these tests are passed, the string is checked to ensure that the subtracted numeral does not appear twice or more elsewhere. We use the Split method again for this.

If all tests are passed, the maximum digit value is adjusted to support rule 3, the subtraction is performed and the result held as the current digit's value. The pointer variable is also incremented to skip the second letter in the pair.

Following the if statement, the value of the single numeral or the subtractive pair is added to the ArrayList for later totalling. The pointer variable is then increased to move to the next available numeral before the closing brace of the while loop.

If you haven't added the above code, add it to the end of the method now.

Rule 5

Rule 5 states that each numeral, or pair of subtractive numerals, must be greater in value than the following one. As we now have all of the values in the ArrayList, we can simply iterate through the values to validate. Add the following code to implement rule 5.

// Rule 5
for (int i = 0; i < values.Count - 1;i++)
    if ((int)values[i] < (int)values[i + 1])
        throw new ArgumentException("Rule 5");

Rule 2

The last rule to be created is rule 2. This rule explains that the values of the numerals and subtractive pairs can be totalled to determine the final value of the Roman numeral. This is achieved with a final loop through the values collection. Add the following to the end of the method to complete the class.

// Rule 2
int total = 0;
foreach (int digit in values)
    total += digit;

return total;
Link to this Page22 January 2008
RSS RSS Feed