Sunday, July 25, 2010

New Combinatoric Functions

Combinatorics can provide some useful functions when working with sequences. In Factor, these are mostly defined in the math.combinatorics vocabulary.

USE: math.combinatorics

Inspired by some functions from clojure.contrib, I recently contributed two additional combinatoric words to the Factor project (although not with the same lazy semantics that the Clojure version has).

all-subsets:

The first word, all-subsets, returns all subsets of a given sequence. This can be calculated by iteratively taking n combinations of items from the sequence, where n goes from 0 (the empty set) to length (the sequence itself).

First, we observe how this works by experimenting with the all-combinations word:

( scratchpad ) { 1 2 } 0 all-combinations .
{ { } }

( scratchpad ) { 1 2 } 1 all-combinations .
{ { 1 } { 2 } }

( scratchpad ) { 1 2 } 2 all-combinations .
{ { 1 2 } }

By running it with various n, we have produced all of the subsets of the { 1 2 } sequence. Using a [0,b] range (from 0 to the length of the sequence), we make a sequence of subsets:

: all-subsets ( seq -- subsets )
    dup length [0,b] [
        [ dupd all-combinations [ , ] each ] each
    ] { } make nip ;

The all-subsets word can then be demonstrated by:

( scratchpad ) { 1 2 3 } all-subsets .
{ { } { 1 } { 2 } { 3 } { 1 2 } { 1 3 } { 2 3 } { 1 2 3 } }

selections:

Another useful function, selections, returns all the ways of taking n (possibly the same) elements from a sequence.

First, we observe that there are two base cases:

  1. If we want all ways of taking 0 elements from the sequence, we have only { } (the empty sequence).
  2. If we want all ways of taking 1 element from the sequence, we essentially have a sequence for each element in the input sequence.

If we take more elements from the sequence, we need to apply the cartesian-product word (which returns all possible pairs of elements from two sequences) n-1 times. For example, if we wanted to see all possible selections of 2 elements from a sequence, run the cartesian-product once:

( scratchpad ) { 1 2 3 } dup cartesian-product concat .
{
    { 1 1 }
    { 1 2 }
    { 1 3 }
    { 2 1 }
    { 2 2 }
    { 2 3 }
    { 3 1 }
    { 3 2 }
    { 3 3 }
}

Using these observations, we can build the selections word:

: (selections) ( seq n -- selections )
    dupd [ dup 1 > ] [
        swap pick cartesian-product [
            [ [ dup length 1 > [ flatten ] when , ] each ] each
        ] { } make swap 1 -
    ] while drop nip ;

: selections ( seq n -- selections )
    {
        { 0 [ drop { } ] }
        { 1 [ 1array ] }
        [ (selections) ]
    } case ;

This can be demonstrated by:

( scratchpad ) { 1 2 } 2 selections .
{ { 1 1 } { 1 2 } { 2 1 } { 2 2 } }

Note: we have defined this to take element order into account, so { 1 2 } and { 2 1 } are different possible results. Also, it could be argued that the result for { 1 2 3 } 1 selections should be { 1 2 3 } [ 1array ] map -- perhaps it should change to that in the future.

This was committed to the main repository recently.

Update: A comment by Jon Harper showed me a way to improve all-subsets. Based on that, I also made some changes to selections (perhaps it could be improved even more):

: all-subsets ( seq -- subsets )
    dup length [0,b] [ all-combinations ] with map concat ;

: (selections) ( seq n -- selections )
    [ [ 1array ] map dup ] [ 1 - ] bi* [
        cartesian-product concat [ { } concat-as ] map
    ] with times ;

: selections ( seq n -- selections )
    dup 0 > [ (selections) ] [ 2drop { } ] if ;

7 comments:

Jon said...
This comment has been removed by the author.
Jon said...
This comment has been removed by the author.
Jon said...

Hi, what do you think of the following definition of all-subsets? It seems to be a bit faster and shorter than the one you committed.

: subsets ( set -- subsets )
dup length [0,b] [ all-combinations ] with map concat ;

Jon

(sry for posting 3 times, I'm not a morning person)

mrjbq7 said...

I like it. I took some of your idea and improved selections too - think we can do better still?

Slava Pestov said...

John, I just wanted to say I really appreciate your blog posts about Factor.

Jon said...

Actually, "selections" is almost already in the library, in the sequences.product vocabulary :

See here : http://paste.factorcode.org/paste?id=1803

I wonder if this is useful though, because if you use "each" or "map" on the result, you might as well use product-map (or product-each) directly..

mrjbq7 said...

Interesting, I didn't know about the "sequences.product" vocab. When using selections with each and map, you're probably right that its better. I wonder if it is worth changing the implementation to one of the ones you suggest, or dropping it?