Combinatorics can provide some useful functions when working with sequences. In Factor, these are mostly defined in the math.combinatorics vocabulary.

USE: math.combinatorics

Inspired by some functions from clojure.contrib, I recently contributed two additional combinatoric words to the Factor project (although not with the same lazy semantics that the Clojure version has).

## all-subsets:

The first word, `all-subsets`

, returns all subsets of a given sequence. This can be calculated by iteratively taking `n`

combinations of items from the sequence, where `n`

goes from `0`

(the empty set) to `length`

(the sequence itself).

First, we observe how this works by experimenting with the all-combinations word:

( scratchpad ) { 1 2 } 0 all-combinations . { { } } ( scratchpad ) { 1 2 } 1 all-combinations . { { 1 } { 2 } } ( scratchpad ) { 1 2 } 2 all-combinations . { { 1 2 } }

By running it with various `n`

, we have produced all of the subsets of the `{ 1 2 }`

sequence. Using a [0,b] range (from 0 to the length of the sequence), we make a sequence of subsets:

: all-subsets ( seq -- subsets ) dup length [0,b] [ [ dupd all-combinations [ , ] each ] each ] { } make nip ;

The `all-subsets`

word can then be demonstrated by:

( scratchpad ) { 1 2 3 } all-subsets . { { } { 1 } { 2 } { 3 } { 1 2 } { 1 3 } { 2 3 } { 1 2 3 } }

## selections:

Another useful function, `selections`

, returns all the ways of taking `n`

(possibly the same) elements from a sequence.

First, we observe that there are two base cases:

- If we want all ways of taking
`0`

elements from the sequence, we have only`{ }`

(the empty sequence). - If we want all ways of taking
`1`

element from the sequence, we essentially have a sequence for each element in the input sequence.

If we take more elements from the sequence, we need to apply the cartesian-product word (which returns all possible pairs of elements from two sequences) `n-1`

times. For example, if we wanted to see all possible selections of `2`

elements from a sequence, run the cartesian-product once:

( scratchpad ) { 1 2 3 } dup cartesian-product concat . { { 1 1 } { 1 2 } { 1 3 } { 2 1 } { 2 2 } { 2 3 } { 3 1 } { 3 2 } { 3 3 } }

Using these observations, we can build the `selections`

word:

: (selections) ( seq n -- selections ) dupd [ dup 1 > ] [ swap pick cartesian-product [ [ [ dup length 1 > [ flatten ] when , ] each ] each ] { } make swap 1 - ] while drop nip ; : selections ( seq n -- selections ) { { 0 [ drop { } ] } { 1 [ 1array ] } [ (selections) ] } case ;

This can be demonstrated by:

( scratchpad ) { 1 2 } 2 selections . { { 1 1 } { 1 2 } { 2 1 } { 2 2 } }

*Note: we have defined this to take element order into account, so { 1 2 } and { 2 1 } are different possible results. Also, it could be argued that the result for { 1 2 3 } 1 selections should be { 1 2 3 } [ 1array ] map -- perhaps it should change to that in the future.*

This was committed to the main repository recently.

Update: A comment by Jon Harper showed me a way to improve `all-subsets`

. Based on that, I also made some changes to `selections`

(perhaps it could be improved even more):

: all-subsets ( seq -- subsets ) dup length [0,b] [ all-combinations ] with map concat ; : (selections) ( seq n -- selections ) [ [ 1array ] map dup ] [ 1 - ] bi* [ cartesian-product concat [ { } concat-as ] map ] with times ; : selections ( seq n -- selections ) dup 0 > [ (selections) ] [ 2drop { } ] if ;

## 7 comments:

Hi, what do you think of the following definition of all-subsets? It seems to be a bit faster and shorter than the one you committed.

: subsets ( set -- subsets )

dup length [0,b] [ all-combinations ] with map concat ;

Jon

(sry for posting 3 times, I'm not a morning person)

I like it. I took some of your idea and improved selections too - think we can do better still?

John, I just wanted to say I really appreciate your blog posts about Factor.

Actually, "selections" is almost already in the library, in the sequences.product vocabulary :

See here : http://paste.factorcode.org/paste?id=1803

I wonder if this is useful though, because if you use "each" or "map" on the result, you might as well use product-map (or product-each) directly..

Interesting, I didn't know about the "sequences.product" vocab. When using selections with each and map, you're probably right that its better. I wonder if it is worth changing the implementation to one of the ones you suggest, or dropping it?

Post a Comment