Thursday, May 2, 2013

Non-repeating

The Programming Praxis blog, which recently achieved the milestone of One Million Hits, posted a challenge a couple days to find the first unrepeated character in a string:

Given a string, find the first character that appears only once in the string. For instance, given the string “aabbcddd”, the first character that appears only once is the “c” found at the 4th character in the string, counting from 0. Be sure that your program properly handles the case that all characters appear more than once.

Our quick-and-dirty solution, using lexical variables, will add each element in the sequence to a hash-set, adding the element to an accumulator if not seen before, removing the element otherwise:

:: non-repeating ( seq -- seq' )
    HS{ } clone :> visited
    0 seq new-resizable :> accum
    seq [
        dup visited ?adjoin
        [ accum push ] [ accum remove! drop ] if
    ] each accum seq like ;

We can then use it like so:

! find all non-repeating characters (in-order)
IN: scratchpad "abcddd" non-repeating .
"abc"

! find the first non-repeating character
IN: scratchpad "abcddd" non-repeating first "%c" printf
a

An improvement might be, instead of using remove! which will look for and remove all occurrences of an object in a sequence, we could instead only remove the first occurrence (knowing an object will only ever be in a sequence once):

: remove-first! ( obj seq -- seq )
    [ index ] keep over [ remove-nth! ] [ nip ] if ;

How else might we improve it?