Friday, February 4, 2011

Maximum/Minimum

For some Factor code I was writing recently, I needed to find the "maximum" and "minimum" of some sequences of values. The "obvious" way to do this is to use the supremum and infimum words from the sequences vocabulary. Those words were not quite what I wanted, so I thought I would write some thoughts about it:

First, an example to show how they work:

( scratchpad ) { 1 2 3 4 } supremum .
4

( scratchpad ) { 1 2 3 4 } infimum .
1

Let's say we create a person tuple with a name and age.

TUPLE: person name age ;

Now, let's say we have a list of people (e.g., a sequence of person objects):

CONSTANT: PEOPLE {
    T{ person f "Jim" 30 }
    T{ person f "Sally" 27 }
    T{ person f "Rebecca" 32 }
    T{ person f "James" 28 }
    T{ person f "Benjamin" 22 }
}

What if we'd like to find out who the oldest person is? We could try using the supremum word:

( scratchpad ) PEOPLE supremum
Generic word <=> does not define a method for the person class.
Dispatching on object: T{ person f "Sally" 27 }

Whoops! The supremum word uses the max generic word, which in turn defaults to using the after? generic word, which in turn defaults to using the <=> generic word (from the math.order vocabulary) to compare two values for +lt+, +eq+, or +gt+.

So, let's define that word for the person types:

M: person <=> [ age>> ] bi@ <=> ;

Okay, let's try again:

( scratchpad ) PEOPLE supremum .
T{ person f "Rebecca" 32 }

Alright, now we also want to find the person with the shortest name. Well, we could use infimum, but it, like supremum, delegates to <=> which (from our previous definition) currently orders by a person's age. Let's redefine <=> to compare based on length of a person's name:

M: person <=> [ name>> length ] bi@ <=> ;

Okay, let's try it now:

( scratchpad ) PEOPLE infimum .
T{ person f "Jim" 30 }

It worked! But, this is quite awkward to change the definition of <=> for each purpose. The problem here is that supremum and infimum (and the <=> word they use) expect to be used for the natural ordering of objects, not for arbitrary orderings.

Instead, we can define two new words, max-by and min-by, that use a provided quotation to obtain a value that is then used to compare two objects (e.g., in our first example, age, and in our second, the length of name).

: max-by ( obj1 obj2 quot: ( obj -- n ) -- obj1/obj2 )
    [ bi@ [ max ] keep eq? not ] curry most ; inline

: min-by ( obj1 obj2 quot: ( obj -- n ) -- obj1/obj2 )
    [ bi@ [ min ] keep eq? not ] curry most ; inline

With these, we can define the words, maximum and minimum, which use these to retrieve the object with "maximum" or "minimum" characteristics (using the specified quotation).

: maximum ( seq quot: ( ... elt -- ... n ) -- elt )
    [ keep 2array ] curry
    [ [ first ] max-by ] map-reduce second ; inline

: minimum ( seq quot: ( ... elt -- ... n ) -- elt )
    [ keep 2array ] curry
    [ [ first ] min-by ] map-reduce second ; inline

Using these, we can easily find our answers:

( scratchpad ) PEOPLE [ age>> ] maximum .
T{ person f "Rebecca" 32 }

( scratchpad ) PEOPLE [ name>> length ] minimum .
T{ person f "Jim" 30 }

It's worth noting that we could define both supremum and infimum in terms of our new words:

: supremum ( seq -- elt ) [ ] maximum ;

: infimum ( seq -- elt ) [ ] minimum ;

But, because of the wrapping and unwrapping that we do inside of maximum and minimum, it's a little less efficient than the current implementation.

3 comments:

_phred said...

You got me thinking:

http://weblog.fredalger.net/2011/02/sorting-in-factor.html

Thinking: always a good thing!

paul said...

An easier approach is the "decorate sort undecorate" idiom - convert each sequence entry into a { key, value } array, sort that (or use supremum, or whatever) then "undecorate" by extracting the value.



: decorate ( seq quot: ( key -- val ) -- decorated-seq ) '[ _ keep 2array ] map ; inline


PEOPLE [ age>> ] decorate supremum second


For sorting, you can use an "undecorate" word:


: undecorate ( kv-seq -- seq ) [ second ] map ;


(Is there any way of formatting code in comments???)

mrjbq7 said...

@_phred: Nice article!

@paul: I like your decorate/undecorate -- I recognize its a common idiom in languages like Python, too. The main advantage of a sequential approach is that it can operate in O(n) (e.g., linear) time. Often complexity is not a constraint, but sometimes it is.