Skip to content
November 29, 2012 / Alex Miller

Roman numeral kata

Yesterday, we held the November Clojure lunch club meeting – it was great to see some new faces (Michael and Matthew) and know there were a couple more people working with Clojure in St. Louis.

We worked on the Roman Numeral kata, basically just converting between Roman numerals and decimal numbers. We first tackled converting decimal to roman. Our first attempt looped over the (diminishing) decimal value, checking a set of conditions to determine what to pull out of the decimal and emit as the next largest Roman numeral.

This approach worked but no one felt good about the big set of cases so we refactored it to be based on a data table. This approach requires looping instead over the set of roman numeral values from big to small, repeatedly tearing off the appropriate hunks.

(def numerals {1 "I",
               4 "IV",
               5 "V",
               9 "IX",
               10 "X",
               40 "XL",
               50 "L",
               90 "XC"
               100 "C",
               400 "CD",
               500 "D",
               900 "CM",
               1000 "M"})

(defn to-roman [decimal]
  (loop [[num & r :as nums] (reverse (sort (keys numerals)))
         roman ""
         d decimal]
    (if num
      (if (>= d num)
        (recur nums (str roman (get numerals num)) (- d num))
        (recur r roman d))

In the main loop we have three variables:

  • nums – The sequence of roman numeral values still to try. These are destructured into the first num and a sequence r of the rest of the them while preserving the whole sequence as nums. This is initialized from our data table by just sorting the keys in descending order (1000, 900, …).
  • roman – The accumulator of the end value, initialized to empty string
  • d – The remaining decimal value to be converted, initialized to the input

In each step of the loop, we have three cases:

  • We still have numerals to try and the d has enough left in it to catch the current numeral – in this case, remove the value of the numeral from d, and prepend the roman numeral to the accumulator.
  • We still have numerals to try and d does NOT have enough left in it to apply to the current numeral – in this case, drop the current numeral and try the next one.
  • We have no numerals left to try – just return the accumulated value.

One key insight we had during this change was to treat the two-letter combinations (like IX) as a numeral value instead of special-casing. We suggested writing some code to generate the numerals table for a smaller starting table. That still appeals to me, but certainly would have been more effort than it was worth.

At this point we did some more testing to satisfy ourselves that it was working. I did not preserve the tests but we just applied to-roman to a range of inputs and eyeballed them, then did some spot checks for larger ones (1492, 1977, etc).

Next we tackled the from-roman as the inverse of to-roman, pulling off the first character (or two!) from the input roman numeral and adding its value to an accumulated decimal. We knew we needed a data table for this one as well so we worked in the repl to munge our prior table till we had something that looked useful, then encoded the process of producing it as a digits table. We ended up reworking that on our second pass.

I did not preserve our first attempt at the from-roman code unfortunately. It worked but had a lot of brute force conditions in it. The digits table was unnecessarily complicated. This is our second pass at cleaning it up:

(def digits (apply hash-map (mapcat reverse numerals)))

(defn from-roman [roman]
  (loop [[c1 c2 & r] roman
         d 0]
    (if-let [num (and c2 (get digits (str c1 c2)))]
      (recur r (+ d num))
      (if-let [num (get digits (str c1))]
        (recur (apply str c2 r) (+ d num))

We loop through the roman numeral and pull off the first char, second char, and the rest of the chars, accumulating our final answer into d.

For each pass of the loop, if there are at least two characters in the roman numeral, we look up that two-char sequence in the digits table. If we find it (ex: IX), then we recurse with the rest of the chars and the accumulated value. If we don’t, then we try again with just the first character (ex: X) – note that this also nicely outlines the no chars case which will do a lookup of nil and return nil (all other should be found).

We had a bug in our first refactoring to this form in recombining c2 back into r (weren’t using apply). Since we had to-roman and from-roman, we were testing by round-tripping over a range of values and checking for equality. We were seeing about 20% failures over the range 1:10000. We found an example value that failed and added a println to the top of the loop to dump the values which quickly highlighted the problem.

I cleaned up the repl checks we were doing into test-round-trip:

(deftest test-round-trip
  (let [nums (range 1 10000)
        round-trip (map #(from-roman (to-roman %)) nums)]
    (is (= nums round-trip))))

That was a fun kata! I wasn’t quite sure how it would work as I’ve never done it before but it wasn’t as bad as I feared.

Full code here.



Leave a Comment
  1. Eric Wilson / Nov 29 2012 11:36 am

    For comparison, here’s my implementation:
    The most substantial difference is that I deal with all characters individually, e.g. no terms like ‘IV’ occur in my code.

  2. Mark / Nov 30 2012 12:21 am

    One question you should always ask is: how obvious is it that the code does what it purports to do? Clojure makes it so easy to cram a lot of complex functionality into one function, but it is much better to create lots of little functions that obviously do some single task. I actually used the roman numeral example to demonstrate this in a post back 2009:

    • Alex Miller / Nov 30 2012 8:44 am

      @Mark: In the case of some code written during a 1 hr group kata session, we were mostly concerned with just getting it working. 🙂

      However I take your point. I think the benefit of your approach is that it generates the sequence at the heart of the computation and then leverages the seq abstraction from there which I have generally found to be a path to clean Clojure code. It removes the loop/recur from our code too which is something I would usually prefer.

      The performance obviously isn’t a key concern but I have found cons-generated lists to be a problem more than once. I wonder whether conjing onto a list to retain the persistent data structure might be an improvement in your code?

      From elsewhere in that thread you pointed to, I noticed the Ruby subtractors version which is quite clever: Would be fun to do the equivalent in Clojure.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: