Update: I just noticed after re-watching Slava's Google Tech Talk that he discusses his vector version of this word (which I originally was calling "distribute") at 34:20. Very cool.
I came across a question on Stack Overflow about how to evenly divide an integer.
The discussion was not particularly interesting, but it reminded me of a similar problem that I had to solve, and decided to implement in Factor. This was early in my experience with Factor, and the conversations on IRC proved helpful for understanding how to structure my Factor code, which I want to share.
This type of problem is part of number theory. There is some discussion on Wolfram Mathworld describing the various ways that a positive integer can be partitioned into the sum of some number of other positive integers.
In my particular case, I needed an even division with the result to be produced in such a way that the rounding affect was distributed across the result set. For example:
( scratchpad ) 3 1 n-partition . { 3 } ( scratchpad ) 3 3 n-partition . { 1 1 1 } ( scratchpad ) 5 3 n-partition . { 2 1 2 } ( scratchpad ) 3 5 n-partition . { 1 0 1 0 1 }
In an applicative language like Python, you might write this algorithm like so:
def n_partition(x, n): delta = float(x) / n last, total = 0, 0.0 l = [] for _ in range(n): total += delta value = int(round(total)) l.append(value - last) last = value return l
My first version in Factor solved the problem, but was terribly ugly and non-idiomatic:
: n-partition ( x n -- seq ) [ / ] keep 0 <array> [ 0 0 ] dip [ + [ [ dup ] dip + ] dip [ dup round ] dip 2dup - [ drop ] dip ] map nip nip nip ;
One sign that you are "doing it wrong" in Factor is frequent usage of stack manipulation using shuffle words. Another sign is creating large words that would be better when broken up into smaller pieces. My solution had both characteristics. Worse, I could barely read it.
The Factor developers are very responsive to questions from beginners, and are helpful in providing an experienced eye to guide newcomers in learning better ways to write their programs. You can reach them on the mailing list or on the #concatenative IRC channel. Often it is useful to paste code so that others can see and comment on it. I thought I'd ask for help, and see if I could learn how to improve.
I received some iterative feedback showing each step along the way towards improving the original code, by using higher-level words and concepts:
- First, extract the logic inside
map
into its own word:
: (n-partition) ( a b c d -- a b c d ) + [ [ dup ] dip + ] dip [ dup round ] dip 2dup - [ drop ] dip ; : n-partition ( x n -- seq ) [ / ] keep 0 <array> [ 0 0 ] dip [ (n-partition) ] map nip nip nip ;
- Instead of the first
+
, usedrop
since there is always a zero there. - Instead of
[ dup ] dip
, usedupd
. - Instead of
[ drop ] dip
, usenip
.
: (n-partition) ( a b c d -- a b c d ) drop [ dupd + ] dip [ dup round ] dip 2dup - nip ; : n-partition ( x n -- seq ) [ / ] keep 0 <array> [ 0 0 ] dip [ (n-partition) ] map nip nip nip ;
- Realize that
[ foo ] dip [ bar ] dip
is the same as[ foo bar ] dip
.
: (n-partition) ( a b c d -- a b c d ) drop [ dupd + dup round ] dip 2dup - nip ; : n-partition ( x n -- seq ) [ / ] keep 0 <array> [ 0 0 ] dip [ (n-partition) ] map nip nip nip ;
- Realize that
2dup - nip
isdupd -
. - Realize that
dupd
is[ dup ] dip
and apply the[ foo bar ] dip
rule again.
: (n-partition) ( a b c d -- a b c d ) drop [ dupd + dup round dup ] dip - ; : n-partition ( x n -- seq ) [ / ] keep 0 <array> [ 0 0 ] dip [ (n-partition) ] map nip nip nip ;
- Now, extract the
drop
out and put it inside themap
.
: (n-partition) ( a b c -- a b c d ) [ dupd + dup round dup ] dip - ; : n-partition ( x n -- seq ) [ / ] keep 0 <array> [ 0 0 ] dip [ drop (n-partition) ] map nip nip nip ;
- Realize that
[ drop foo ] map
is[ foo ] replicate
.
: (n-partition) ( a b c -- a b c d ) [ dupd + dup round dup ] dip - ; : n-partition ( x n -- seq ) [ / ] keep 0 <array> [ 0 0 ] dip [ (n-partition) ] replicate nip nip nip ;
- Now,
replicate
doesn't care what elements of the sequence are, only its length, so you may as well be using "n" instead of an array of "n" zeroes, so we can remove the0 <array>
. - Finally,
[ foo ] keep [ bar ] dip
is the same as[ foo bar ] keep
.
This produces our final "cleaned up" version:
: (n-partition) ( a b c -- a b c d ) [ dupd + dup round dup ] dip - ; : n-partition ( x n -- seq ) [ / 0 0 ] keep [ (n-partition) ] replicate nip nip nip ;
Afterwards, I discussed with Slava other alternatives. One idea was to use locals for the (n-partition)
word, but it was not much of an improvement by itself (perhaps better if we introduced meaningful names instead of a b c d
):
:: (n-partition) ( a b c -- a b c d ) a a b + dup round dup c - ;
Another idea was to work on an array of values and then apply vector arithmetic on it:
: percentages ( n -- seq ) [ [1,b] ] keep v/n ; : steps ( x n -- seq ) percentages n*v ; : rounded ( seq -- seq' ) [ round ] map ; : differences ( seq -- seq' ) dup 0 prefix v- ; : n-partition ( x n -- seq ) steps rounded differences ;
To some extent, this is a question of aesthetics. I happen to like this last version (that uses vectors) the best, but I can see reasons why the original cleaned up version might be attractive too.
Very nice to see the progression of refactoring your code. I too prefer the final vector-based version...short words with concise definitions that are easy to follow without too many stack shufflers. Great writeup!
ReplyDelete