## Friday, February 4, 2011

### Maximum/Minimum

For some Factor code I was writing recently, I needed to find the "maximum" and "minimum" of some sequences of values. The "obvious" way to do this is to use the supremum and infimum words from the `sequences` vocabulary. Those words were not quite what I wanted, so I thought I would write some thoughts about it:

First, an example to show how they work:

```( scratchpad ) { 1 2 3 4 } supremum .
4

( scratchpad ) { 1 2 3 4 } infimum .
1```

Let's say we create a `person` tuple with a name and age.

`TUPLE: person name age ;`

Now, let's say we have a list of people (e.g., a sequence of `person` objects):

```CONSTANT: PEOPLE {
T{ person f "Jim" 30 }
T{ person f "Sally" 27 }
T{ person f "Rebecca" 32 }
T{ person f "James" 28 }
T{ person f "Benjamin" 22 }
}```

What if we'd like to find out who the oldest person is? We could try using the `supremum` word:

```( scratchpad ) PEOPLE supremum
Generic word <=> does not define a method for the person class.
Dispatching on object: T{ person f "Sally" 27 }```

Whoops! The `supremum` word uses the `max` generic word, which in turn defaults to using the `after?` generic word, which in turn defaults to using the `<=>` generic word (from the `math.order` vocabulary) to compare two values for `+lt+`, `+eq+`, or `+gt+`.

So, let's define that word for the `person` types:

`M: person <=> [ age>> ] bi@ <=> ;`

Okay, let's try again:

```( scratchpad ) PEOPLE supremum .
T{ person f "Rebecca" 32 }```

Alright, now we also want to find the person with the shortest name. Well, we could use `infimum`, but it, like `supremum`, delegates to `<=>` which (from our previous definition) currently orders by a person's `age`. Let's redefine `<=>` to compare based on length of a person's `name`:

`M: person <=> [ name>> length ] bi@ <=> ;`

Okay, let's try it now:

```( scratchpad ) PEOPLE infimum .
T{ person f "Jim" 30 }```

It worked! But, this is quite awkward to change the definition of `<=>` for each purpose. The problem here is that `supremum` and `infimum` (and the `<=>` word they use) expect to be used for the natural ordering of objects, not for arbitrary orderings.

Instead, we can define two new words, `max-by` and `min-by`, that use a provided `quotation` to obtain a value that is then used to compare two objects (e.g., in our first example, `age`, and in our second, the length of `name`).

```: max-by ( obj1 obj2 quot: ( obj -- n ) -- obj1/obj2 )
[ bi@ [ max ] keep eq? not ] curry most ; inline

: min-by ( obj1 obj2 quot: ( obj -- n ) -- obj1/obj2 )
[ bi@ [ min ] keep eq? not ] curry most ; inline```

With these, we can define the words, `maximum` and `minimum`, which use these to retrieve the object with "maximum" or "minimum" characteristics (using the specified quotation).

```: maximum ( seq quot: ( ... elt -- ... n ) -- elt )
[ keep 2array ] curry
[ [ first ] max-by ] map-reduce second ; inline

: minimum ( seq quot: ( ... elt -- ... n ) -- elt )
[ keep 2array ] curry
[ [ first ] min-by ] map-reduce second ; inline```

Using these, we can easily find our answers:

```( scratchpad ) PEOPLE [ age>> ] maximum .
T{ person f "Rebecca" 32 }

( scratchpad ) PEOPLE [ name>> length ] minimum .
T{ person f "Jim" 30 }```

It's worth noting that we could define both `supremum` and `infimum` in terms of our new words:

```: supremum ( seq -- elt ) [ ] maximum ;

: infimum ( seq -- elt ) [ ] minimum ;```

But, because of the wrapping and unwrapping that we do inside of `maximum` and `minimum`, it's a little less efficient than the current implementation.

_phred said...

You got me thinking:

http://weblog.fredalger.net/2011/02/sorting-in-factor.html

Thinking: always a good thing!

paul said...

An easier approach is the "decorate sort undecorate" idiom - convert each sequence entry into a { key, value } array, sort that (or use supremum, or whatever) then "undecorate" by extracting the value.

: decorate ( seq quot: ( key -- val ) -- decorated-seq ) '[ _ keep 2array ] map ; inline

PEOPLE [ age>> ] decorate supremum second

For sorting, you can use an "undecorate" word:

: undecorate ( kv-seq -- seq ) [ second ] map ;

(Is there any way of formatting code in comments???)

mrjbq7 said...

@_phred: Nice article!

@paul: I like your decorate/undecorate -- I recognize its a common idiom in languages like Python, too. The main advantage of a sequential approach is that it can operate in O(n) (e.g., linear) time. Often complexity is not a constraint, but sometimes it is.