Monday, May 24, 2010

Evenly Partition an Integer

Update: I just noticed after re-watching Slava's Google Tech Talk that he discusses his vector version of this word (which I originally was calling "distribute") at 34:20. Very cool.

I came across a question on Stack Overflow about how to evenly divide an integer.

The discussion was not particularly interesting, but it reminded me of a similar problem that I had to solve, and decided to implement in Factor. This was early in my experience with Factor, and the conversations on IRC proved helpful for understanding how to structure my Factor code, which I want to share.

This type of problem is part of number theory. There is some discussion on Wolfram Mathworld describing the various ways that a positive integer can be partitioned into the sum of some number of other positive integers.

In my particular case, I needed an even division with the result to be produced in such a way that the rounding affect was distributed across the result set. For example:

( scratchpad ) 3 1 n-partition .
{ 3 }

( scratchpad ) 3 3 n-partition .
{ 1 1 1 }

( scratchpad ) 5 3 n-partition .
{ 2 1 2 }

( scratchpad ) 3 5 n-partition .
{ 1 0 1 0 1 }

In an applicative language like Python, you might write this algorithm like so:

def n_partition(x, n):
    delta = float(x) / n
    last, total = 0, 0.0
    l = []
    for _ in range(n):
        total += delta
        value = int(round(total))
        l.append(value - last)
        last = value
    return l

My first version in Factor solved the problem, but was terribly ugly and non-idiomatic:

: n-partition ( x n -- seq ) 
    [ / ] keep 0 <array> [ 0 0 ] dip [
        + [ [ dup ] dip + ] dip  
        [ dup round ] dip 2dup -
        [ drop ] dip 
    ] map nip nip nip ;

One sign that you are "doing it wrong" in Factor is frequent usage of stack manipulation using shuffle words. Another sign is creating large words that would be better when broken up into smaller pieces. My solution had both characteristics. Worse, I could barely read it.

The Factor developers are very responsive to questions from beginners, and are helpful in providing an experienced eye to guide newcomers in learning better ways to write their programs. You can reach them on the mailing list or on the #concatenative IRC channel. Often it is useful to paste code so that others can see and comment on it. I thought I'd ask for help, and see if I could learn how to improve.

I received some iterative feedback showing each step along the way towards improving the original code, by using higher-level words and concepts:

  1. First, extract the logic inside map into its own word:
: (n-partition) ( a b c d -- a b c d )
    + [ [ dup ] dip + ] dip  
    [ dup round ] dip 2dup -
    [ drop ] dip ; 

: n-partition ( x n -- seq ) 
    [ / ] keep 0 <array> [ 0 0 ] dip
    [ (n-partition) ] map nip nip nip ;
  1. Instead of the first +, use drop since there is always a zero there.
  2. Instead of [ dup ] dip, use dupd.
  3. Instead of [ drop ] dip, use nip.
: (n-partition) ( a b c d -- a b c d )
    drop [ dupd + ] dip [ dup round ] dip 2dup - nip ;

: n-partition ( x n -- seq ) 
    [ / ] keep 0 <array> [ 0 0 ] dip
    [ (n-partition) ] map nip nip nip ;
  1. Realize that [ foo ] dip [ bar ] dip is the same as [ foo bar ] dip.
: (n-partition) ( a b c d -- a b c d )
    drop [ dupd + dup round ] dip 2dup - nip ;

: n-partition ( x n -- seq ) 
    [ / ] keep 0 <array> [ 0 0 ] dip
    [ (n-partition) ] map nip nip nip ;
  1. Realize that 2dup - nip is dupd -.
  2. Realize that dupd is [ dup ] dip and apply the [ foo bar ] dip rule again.
: (n-partition) ( a b c d -- a b c d )
    drop [ dupd + dup round dup ] dip - ;

: n-partition ( x n -- seq ) 
    [ / ] keep 0 <array> [ 0 0 ] dip
    [ (n-partition) ] map nip nip nip ;
  1. Now, extract the drop out and put it inside the map.
: (n-partition) ( a b c -- a b c d )
    [ dupd + dup round dup ] dip - ;

: n-partition ( x n -- seq ) 
    [ / ] keep 0 <array> [ 0 0 ] dip 
    [ drop (n-partition) ] map nip nip nip ;
  1. Realize that [ drop foo ] map is [ foo ] replicate.
: (n-partition) ( a b c -- a b c d )
    [ dupd + dup round dup ] dip - ;

: n-partition ( x n -- seq ) 
    [ / ] keep 0 <array> [ 0 0 ] dip 
    [ (n-partition) ] replicate nip nip nip ;
  1. Now, replicate doesn't care what elements of the sequence are, only its length, so you may as well be using "n" instead of an array of "n" zeroes, so we can remove the 0 <array>.
  2. Finally, [ foo ] keep [ bar ] dip is the same as [ foo bar ] keep.

This produces our final "cleaned up" version:

: (n-partition) ( a b c -- a b c d )
    [ dupd + dup round dup ] dip - ;

: n-partition ( x n -- seq )
    [ / 0 0 ] keep [ (n-partition) ] replicate nip nip nip ;

Afterwards, I discussed with Slava other alternatives. One idea was to use locals for the (n-partition) word, but it was not much of an improvement by itself (perhaps better if we introduced meaningful names instead of a b c d):

:: (n-partition) ( a b c -- a b c d )
    a a b + dup round dup c - ;

Another idea was to work on an array of values and then apply vector arithmetic on it:

: percentages ( n -- seq ) [ [1,b] ] keep v/n ;

: steps ( x n -- seq ) percentages n*v ;

: rounded ( seq -- seq' ) [ round ] map ;

: differences ( seq -- seq' ) dup 0 prefix v- ;

: n-partition ( x n -- seq ) steps rounded differences ;

To some extent, this is a question of aesthetics. I happen to like this last version (that uses vectors) the best, but I can see reasons why the original cleaned up version might be attractive too.

Monday, May 17, 2010

Creating Fake Data

A few days ago there was a post on Hacker News about a project called Faker.js. Basically, the project is a Javascript clone of libraries in Ruby and Perl for creating fake data for names, phone numbers, e-mails, street addresses, company information, etc.

I took a look at the code, and got inspired to create a Factor version.

First, some useful vocabularies:

USING: ascii combinators fry kernel make math random sequences ;

All three implementations seem to use simple random selection to generate the "fake" information. For example, each has a long list of valid first and last names. It looks sort of like:

CONSTANT: FIRST-NAME {
    "Aaliyah" "Aaron" "Abagail" "Abbey" "Abbie" "Abbigail"
    "Abby" "Abdiel" "Abdul" "Abdullah" "Abe" "Abel" "Abelardo"
    "Abigail" "Abigale" "Abigayle" "Abner" "Abraham" "Ada"
    "Adah" "Adalberto" "Adaline" "Adam" "Adan" "Addie" "Addison"
    "Adela" "Adelbert" "Adele" "Adelia" "Adeline" "Adell"
    ...

CONSTANT: LAST-NAME {
    "Abbott" "Abernathy" "Abshire" "Adams" "Altenwerth"
    "Anderson" "Ankunding" "Armstrong" "Auer" "Aufderhar"
    "Bahringer" "Bailey" "Balistreri" "Barrows" "Bartell"
    "Bartoletti" "Barton" "Bashirian" "Batz" "Bauch" "Baumbach"
    ...

Given a long list of possible names, generating a fake name is no more complicated than:

( scratchpad ) FIRST-NAME random LAST-NAME random " " glue .
"Greyson Barrows"

Similarly, creating phone numbers is no more complicated than a list of typical phone number patterns, combined with a word that performs substitution of random numbers:

CONSTANT: PHONE-NUMBER {
    "###-###-####"
    "(###)###-####"
    "1-###-###-####"
    "###.###.####"
    "###-###-####"
    "(###)###-####"
    "1-###-###-####"
    ...

: (numbers) ( str -- str' )
    [ dup CHAR: # = [ drop "0123456789" random ] when ] map ;

Generating a fake phone number (without performing any kind of validation on area codes or local numbers) is as easy as:

( scratchpad ) PHONE-NUMBER random (numbers) .
"352-327-9815"

For added flavor, the author chose to include "business bullshit" generation:

( scratchpad ) 5 [ fake-bs . ] times
"leverage 24/7 models"
"deploy ubiquitous vortals"
"maximize holistic channels"
"exploit real-time niches"
"unleash proactive mindshare"

And "product catch phrase" generation:

( scratchpad ) 5 [ fake-catch-phrase . ] times
"Reverse-engineered value-added toolset"
"Diverse systemic concept"
"Ergonomic holistic pricing structure"
"Persevering local interface"
"Intuitive human-resource time-frame"

Useful for scale testing websites, practical jokes, and probably less innocent purposes. You can see the full version on my GitHub account.

Friday, May 14, 2010

Time Spans for Humans

When dealing with time spans or durations, it is nice to be able to render elapsed time in a way that is easy for the human eye to read.

In some cases, this could mean translating the way you might say it or write it (two years, ten days, four hours, thirty minutes, and fifteen seconds) or a more concise form (2y 10d 4h 30m 15s). The concise form has become popular in some web applications, and solutions exist for both in most languages including python, perl, and java. I thought I would build a word in Factor to calculate it.

First, we need some other vocabularies:

USING: kernel make math math.parser sequences ;

The make vocabulary can be used to execute a quotation that builds a sequence of values at runtime. This can be a little confusing for those new to Factor. Basically, you provide it a quotation that periodically emits values (by calling the , word) that are collected into the sequence type you provide. For example:

( scratchpad ) [ 1 , 2 , 3 , ] { } make .
{ 1 2 3 }

Next, we need an algorithm. One way is to iteratively divmod the number of seconds into each category (e.g., years, weeks, days, hours, minutes, seconds) and count it if the number in the category is non-zero.

: elapsed-time ( seconds -- string )
    dup 0 < [ "negative seconds" throw ] when [
        {
            { 60 "s" }
            { 60 "m" }
            { 24 "h" }
            {  7 "d" }
            { 52 "w" }
            {  f "y" }
        } [
            [ first [ /mod ] [ dup ] if* ] [ second ] bi swap
            dup 0 > [ number>string prepend , ] [ 2drop ] if
        ] each drop
    ] { } make [ "0s" ] [ reverse " " join ] if-empty ;

And then to see it work:

( scratchpad ) 123456 elapsed-time .
"1d 10h 17m 36s"

Thursday, May 13, 2010

Relative Time for Humans

Many applications deal with events that have occurred at different points in time. Often, these times are relatively recent, such as a list of emails in an inbox or a list of status updates on a social network. When we talk about when an event occurred, it is sometimes useful to be precise (e.g., Thu May 13 19:12:02 2010), and other times it is better to be a bit softer (e.g., about a minute ago).

This function gets built from time-to-time in various languages: for example in Python and Javascript. I thought it might be fun to see how simple it could be in Factor.

A very useful type of conditional combinator is called cond. It applies to a sequence of quotations. The first elements in the sequence are pairs of quotations. The combinator proceeds to call the first quotation in each pair until a true value is yielded, at which point it calls the second quotation in that pair and stops iterating. A final quotation can be provided to handle the case where none of the pairs returned a true value.

In typical applicative languages, this would be provided by an "if/else if/else" chain. It is a general pattern and is used in Factor to implement case (a type of switch statement).

Using a few vocabularies:

USING: combinators formatting kernel math ;

We will write a word that takes the number of seconds that something happened in the past, and calculates how long ago it was in "human terms":

: relative-time ( seconds -- string ) dup 0 < [ "negative seconds" throw ] when { { [ dup 1 < ] [ drop "just now" ] } { [ dup 60 < ] [ drop "less than a minute ago" ] } { [ dup 120 < ] [ drop "about a minute ago" ] } { [ dup 2700 < ] [ 60 / "%d minutes ago" sprintf ] } { [ dup 5400 < ] [ drop "about an hour ago" ] } { [ dup 86400 < ] [ 3600 / "%d hours ago" sprintf ] } { [ dup 172800 < ] [ drop "1 day ago" ] } [ 86400 / "%d days ago" sprintf ] } cond ;

Fairly concise as it turns out. If it wasn't for the minor stack manipulation, you might say this was getting very close to the essence of the algorithm.