When dealing with sequences, it can be useful to "group" them based on some criteria into smaller sequences. I couldn't find a word that quite solved my problem in Factor, so I wrote `group-by`

:

USING: assocs kernel sequences ; : group-by ( seq quot: ( elt -- key ) -- assoc ) H{ } clone [ [ push-at ] curry compose [ dup ] prepose each ] keep ; inline

## Examples

For example, we could use it to split the first 20 numbers into two groups based on whether they are prime or not:

( scratchpad ) USE: math.primes ( scratchpad ) 20 iota [ prime? ] group-by . H{ { f V{ 0 1 4 6 8 9 10 12 14 15 16 18 } } { t V{ 2 3 5 7 11 13 17 19 } } }

Or, we could group the subsets of a string by their length:

( scratchpad ) USE: math.combinatorics ( scratchpad ) "abc" all-subsets [ length ] group-by . H{ { 0 V{ "" } } { 1 V{ "a" "b" "c" } } { 2 V{ "ab" "ac" "bc" } } { 3 V{ "abc" } } }

Or, group some random numbers by their bit-count:

( scratchpad ) USE: math.bitwise ( scratchpad ) 20 [ 100 random ] replicate [ bit-count ] group-by . H{ { 1 V{ 32 1 32 4 } } { 2 V{ 12 } } { 3 V{ 13 74 56 98 44 } } { 4 V{ 83 30 45 46 75 77 } } { 5 V{ 61 59 61 } } { 6 V{ 63 } } }

## How it works

The Factor code is roughly equivalent to the following Python code:

from collections import defaultdict def group_by(seq, f): d = defaultdict(list) for value in seq: key = f(value) d[key].append(value) return d

Let's take it step-by-step. First, start defining a word (function) called `group-by`

.

: group-by

Next, define it to take two arguments, a sequence (list or array) of values and a quotation (anonymous function or lambda) that computes a key for each element, and outputs an assoc (association, map, or dict). Names here are used only for documentation, it could take a `foo`

and `bar`

and return a `baz`

.

( seq quot: ( elt -- key ) -- assoc )

The code inside the word is everything until the "`;`

". We want to output a hashtable, so we first create one by cloning an empty hashtable (`H{ }`

).

H{ } clone

We will compose a word that duplicates each element to compute a key that is used to push each element into an appropriate bucket (a vector) in the hashtable. The `push-at`

word has the signature `( value key assoc -- )`

. For example, if grouping by the length of a string, we want something that looks a bit like `[ dup length H{ } push-at ]`

:

[ push-at ] curry compose [ dup ] prepose

We then, apply this quotation to each element in the sequence:

`each`

And, finally, we want to make sure that the hashtable that we created isn't "consumed", but kept on the stack as a return value.

[ ... ] keep ;

## 6 comments:

For those of us who don't understand Factor, can you explain word by word what your group-by function is doing?

Hi Laurie,

I updated the post with an explanation of how group-by works. Perhaps you can take a look and let me know if it helps.

Thanks!

Hi mrjbq7!

Have you seen the 'partition' word?

dharmatech

Yeah, "partition" is a good word if you only want two groups.

For my particular use case (an upcoming blog post, actually), I had a number of tuples (objects) that I wanted to group according to one of its slots (attributes).

This version might be easier to read:

: group-by ( seq quot: ( elt -- key ) -- hashtable )

'[ [ dup @ ] dip push-at ] sequence>hashtable ; inline

The sequence>hashtable combinator outputs a hashtable after calling the quotation for each element of the sequence with the stack effect ( elt hashtable -- ). So you dip under the hashtable, dup the element and call your quotation to get a key, then push it onto a vector stored in the hashtable.

Ah... I should've read the examples more carefully. 8-)

Post a Comment