Monday, May 24, 2010

Evenly Partition an Integer

Update: I just noticed after re-watching Slava's Google Tech Talk that he discusses his vector version of this word (which I originally was calling "distribute") at 34:20. Very cool.

I came across a question on Stack Overflow about how to evenly divide an integer.

The discussion was not particularly interesting, but it reminded me of a similar problem that I had to solve, and decided to implement in Factor. This was early in my experience with Factor, and the conversations on IRC proved helpful for understanding how to structure my Factor code, which I want to share.

This type of problem is part of number theory. There is some discussion on Wolfram Mathworld describing the various ways that a positive integer can be partitioned into the sum of some number of other positive integers.

In my particular case, I needed an even division with the result to be produced in such a way that the rounding affect was distributed across the result set. For example:

( scratchpad ) 3 1 n-partition .
{ 3 }

( scratchpad ) 3 3 n-partition .
{ 1 1 1 }

( scratchpad ) 5 3 n-partition .
{ 2 1 2 }

( scratchpad ) 3 5 n-partition .
{ 1 0 1 0 1 }

In an applicative language like Python, you might write this algorithm like so:

def n_partition(x, n):
    delta = float(x) / n
    last, total = 0, 0.0
    l = []
    for _ in range(n):
        total += delta
        value = int(round(total))
        l.append(value - last)
        last = value
    return l

My first version in Factor solved the problem, but was terribly ugly and non-idiomatic:

: n-partition ( x n -- seq ) 
    [ / ] keep 0 <array> [ 0 0 ] dip [
        + [ [ dup ] dip + ] dip  
        [ dup round ] dip 2dup -
        [ drop ] dip 
    ] map nip nip nip ;

One sign that you are "doing it wrong" in Factor is frequent usage of stack manipulation using shuffle words. Another sign is creating large words that would be better when broken up into smaller pieces. My solution had both characteristics. Worse, I could barely read it.

The Factor developers are very responsive to questions from beginners, and are helpful in providing an experienced eye to guide newcomers in learning better ways to write their programs. You can reach them on the mailing list or on the #concatenative IRC channel. Often it is useful to paste code so that others can see and comment on it. I thought I'd ask for help, and see if I could learn how to improve.

I received some iterative feedback showing each step along the way towards improving the original code, by using higher-level words and concepts:

  1. First, extract the logic inside map into its own word:
: (n-partition) ( a b c d -- a b c d )
    + [ [ dup ] dip + ] dip  
    [ dup round ] dip 2dup -
    [ drop ] dip ; 

: n-partition ( x n -- seq ) 
    [ / ] keep 0 <array> [ 0 0 ] dip
    [ (n-partition) ] map nip nip nip ;
  1. Instead of the first +, use drop since there is always a zero there.
  2. Instead of [ dup ] dip, use dupd.
  3. Instead of [ drop ] dip, use nip.
: (n-partition) ( a b c d -- a b c d )
    drop [ dupd + ] dip [ dup round ] dip 2dup - nip ;

: n-partition ( x n -- seq ) 
    [ / ] keep 0 <array> [ 0 0 ] dip
    [ (n-partition) ] map nip nip nip ;
  1. Realize that [ foo ] dip [ bar ] dip is the same as [ foo bar ] dip.
: (n-partition) ( a b c d -- a b c d )
    drop [ dupd + dup round ] dip 2dup - nip ;

: n-partition ( x n -- seq ) 
    [ / ] keep 0 <array> [ 0 0 ] dip
    [ (n-partition) ] map nip nip nip ;
  1. Realize that 2dup - nip is dupd -.
  2. Realize that dupd is [ dup ] dip and apply the [ foo bar ] dip rule again.
: (n-partition) ( a b c d -- a b c d )
    drop [ dupd + dup round dup ] dip - ;

: n-partition ( x n -- seq ) 
    [ / ] keep 0 <array> [ 0 0 ] dip
    [ (n-partition) ] map nip nip nip ;
  1. Now, extract the drop out and put it inside the map.
: (n-partition) ( a b c -- a b c d )
    [ dupd + dup round dup ] dip - ;

: n-partition ( x n -- seq ) 
    [ / ] keep 0 <array> [ 0 0 ] dip 
    [ drop (n-partition) ] map nip nip nip ;
  1. Realize that [ drop foo ] map is [ foo ] replicate.
: (n-partition) ( a b c -- a b c d )
    [ dupd + dup round dup ] dip - ;

: n-partition ( x n -- seq ) 
    [ / ] keep 0 <array> [ 0 0 ] dip 
    [ (n-partition) ] replicate nip nip nip ;
  1. Now, replicate doesn't care what elements of the sequence are, only its length, so you may as well be using "n" instead of an array of "n" zeroes, so we can remove the 0 <array>.
  2. Finally, [ foo ] keep [ bar ] dip is the same as [ foo bar ] keep.

This produces our final "cleaned up" version:

: (n-partition) ( a b c -- a b c d )
    [ dupd + dup round dup ] dip - ;

: n-partition ( x n -- seq )
    [ / 0 0 ] keep [ (n-partition) ] replicate nip nip nip ;

Afterwards, I discussed with Slava other alternatives. One idea was to use locals for the (n-partition) word, but it was not much of an improvement by itself (perhaps better if we introduced meaningful names instead of a b c d):

:: (n-partition) ( a b c -- a b c d )
    a a b + dup round dup c - ;

Another idea was to work on an array of values and then apply vector arithmetic on it:

: percentages ( n -- seq ) [ [1,b] ] keep v/n ;

: steps ( x n -- seq ) percentages n*v ;

: rounded ( seq -- seq' ) [ round ] map ;

: differences ( seq -- seq' ) dup 0 prefix v- ;

: n-partition ( x n -- seq ) steps rounded differences ;

To some extent, this is a question of aesthetics. I happen to like this last version (that uses vectors) the best, but I can see reasons why the original cleaned up version might be attractive too.

1 comment:

  1. Very nice to see the progression of refactoring your code. I too prefer the final vector-based version...short words with concise definitions that are easy to follow without too many stack shufflers. Great writeup!

    ReplyDelete