Wednesday, August 24, 2011

Successor

A few days ago, I wrote about translating the humanize function from the slang.js "string utility" library into Factor. While looking through the other functions defined in that library, I came across the successor function.

The "successor" of a string is defined to be a kind of "alphanum increment". It's easiest to show a few examples of how it works:

successor("a")         == "b"
successor("1")         == "2"
successor("abcd")      == "abce"
successor("THX1138")   == "THX1139"
successor("<<koala>>") == "<<koalb>>"
successor("1999zzz")   == "2000aaa"
successor("ZZZ9999")   == "AAAA0000"

We are going to implement this in Factor, using the slang.js documentation as a guide:

"Returns the successor to str. The successor is calculated by incrementing characters starting from the rightmost alphanumeric (or the rightmost character if there are no alphanumerics) in the string. Incrementing a digit always results in another digit, and incrementing a letter results in another letter of the same case.

If the increment generates a carry, the character to the left of it is incremented. This process repeats until there is no carry, adding an additional character if necessary."

To start, we should handle the "carry" logic. There are two kinds of carries: digits and letters. Both involve checking if a character has exceeded a range (resetting it to the beginning of the range if it has). We can build a word that does this, returning the new character value as well as a boolean flag indicating if a carry occurred:
: carry ( elt last first -- ? elt' )
    '[ _ > dup _ ] keep ? ;
Using this to carry digits is pretty easy (using the 0 to 9 range):
: next-digit ( ch -- ? ch' )
    1 + CHAR: 9 CHAR: 0 carry ;
To carry letters, we need to make sure that the carry preserves the original case (uppercase or lowercase) of the letter:
: next-letter ( ch -- ? ch' )
    [ ch>lower 1 + CHAR: z CHAR: a carry ] [ LETTER? ] bi
    [ ch>upper ] when ;
And, finally, to generalize this to all characters, we check if it is a digit or a letter and dispatch to the proper function, or pass the character through if it is neither:
: next-char ( ch -- ? ch' )
    {
        { [ dup digit?  ] [ next-digit  ] }
        { [ dup Letter? ] [ next-letter ] }
        [ t swap ]
    } cond ;
This leaves the core of the algorithm, which starts at the end of the string, incrementing each character (continuing if the carry flag is true), and then handling the case where we need to carry the first character:
: (successor) ( str -- str' )
    dup length t [ over 0 > dupd and ] [
        drop 1 - dup pick [ next-char ] change-nth
    ] while nip [ dup first prefix ] when ;

: successor ( str -- str' )
    dup empty? [ (successor) ] unless ;
The code for this is on my Github, and compares favorably at 30 lines of code versus the original 50. The (successor) function uses a fair amount of stack shuffling, can you improve it?

1 comment:

Rupert said...

As I mentioned on IRC, I wrote down a slightly different way of going about things here: http://paste.factorcode.org/paste?id=2339. Probably much slower, but avoids some of the stack-shuffling.