For some Factor code I was writing recently, I needed to find the "maximum" and "minimum" of some sequences of values. The "obvious" way to do this is to use the supremum and infimum words from the sequences
vocabulary. Those words were not quite what I wanted, so I thought I would write some thoughts about it:
First, an example to show how they work:
( scratchpad ) { 1 2 3 4 } supremum . 4 ( scratchpad ) { 1 2 3 4 } infimum . 1
Let's say we create a person
tuple with a name and age.
TUPLE: person name age ;
Now, let's say we have a list of people (e.g., a sequence of person
objects):
CONSTANT: PEOPLE { T{ person f "Jim" 30 } T{ person f "Sally" 27 } T{ person f "Rebecca" 32 } T{ person f "James" 28 } T{ person f "Benjamin" 22 } }
What if we'd like to find out who the oldest person is? We could try using the supremum
word:
( scratchpad ) PEOPLE supremum Generic word <=> does not define a method for the person class. Dispatching on object: T{ person f "Sally" 27 }
Whoops! The supremum
word uses the max
generic word, which in turn defaults to using the after?
generic word, which in turn defaults to using the <=>
generic word (from the math.order
vocabulary) to compare two values for +lt+
, +eq+
, or +gt+
.
So, let's define that word for the person
types:
M: person <=> [ age>> ] bi@ <=> ;
Okay, let's try again:
( scratchpad ) PEOPLE supremum . T{ person f "Rebecca" 32 }
Alright, now we also want to find the person with the shortest name. Well, we could use infimum
, but it, like supremum
, delegates to <=>
which (from our previous definition) currently orders by a person's age
. Let's redefine <=>
to compare based on length of a person's name
:
M: person <=> [ name>> length ] bi@ <=> ;
Okay, let's try it now:
( scratchpad ) PEOPLE infimum . T{ person f "Jim" 30 }
It worked! But, this is quite awkward to change the definition of <=>
for each purpose. The problem here is that supremum
and infimum
(and the <=>
word they use) expect to be used for the natural ordering of objects, not for arbitrary orderings.
Instead, we can define two new words, max-by
and min-by
, that use a provided quotation
to obtain a value that is then used to compare two objects (e.g., in our first example, age
, and in our second, the length of name
).
: max-by ( obj1 obj2 quot: ( obj -- n ) -- obj1/obj2 ) [ bi@ [ max ] keep eq? not ] curry most ; inline : min-by ( obj1 obj2 quot: ( obj -- n ) -- obj1/obj2 ) [ bi@ [ min ] keep eq? not ] curry most ; inline
With these, we can define the words, maximum
and minimum
, which use these to retrieve the object with "maximum" or "minimum" characteristics (using the specified quotation).
: maximum ( seq quot: ( ... elt -- ... n ) -- elt ) [ keep 2array ] curry [ [ first ] max-by ] map-reduce second ; inline : minimum ( seq quot: ( ... elt -- ... n ) -- elt ) [ keep 2array ] curry [ [ first ] min-by ] map-reduce second ; inline
Using these, we can easily find our answers:
( scratchpad ) PEOPLE [ age>> ] maximum . T{ person f "Rebecca" 32 } ( scratchpad ) PEOPLE [ name>> length ] minimum . T{ person f "Jim" 30 }
It's worth noting that we could define both supremum
and infimum
in terms of our new words:
: supremum ( seq -- elt ) [ ] maximum ; : infimum ( seq -- elt ) [ ] minimum ;
But, because of the wrapping and unwrapping that we do inside of maximum
and minimum
, it's a little less efficient than the current implementation.
You got me thinking:
ReplyDeletehttp://weblog.fredalger.net/2011/02/sorting-in-factor.html
Thinking: always a good thing!
An easier approach is the "decorate sort undecorate" idiom - convert each sequence entry into a { key, value } array, sort that (or use supremum, or whatever) then "undecorate" by extracting the value.
ReplyDelete: decorate ( seq quot: ( key -- val ) -- decorated-seq ) '[ _ keep 2array ] map ; inline
PEOPLE [ age>> ] decorate supremum second
For sorting, you can use an "undecorate" word:
: undecorate ( kv-seq -- seq ) [ second ] map ;
(Is there any way of formatting code in comments???)
@_phred: Nice article!
ReplyDelete@paul: I like your decorate/undecorate -- I recognize its a common idiom in languages like Python, too. The main advantage of a sequential approach is that it can operate in O(n) (e.g., linear) time. Often complexity is not a constraint, but sometimes it is.