Saturday, July 23, 2011

Majority Vote

A Linear Time Majority Vote Algorithm was invented in 1980 by Bob Boyer and J Strother Moore (inventors of the popular Boyer-Moore string search algorithm). Not seeing this available in Factor, I thought to contribute one.
Note: this is also called the "Moore’s Voting Algorithm".
The algorithm simply looks at each element of the sequence:

Keep a candidate element and a counter (initially unknown and zero, respectively).

As we move across the sequence, look at each element:
  • If the counter is 0: the element is the candidate and the counter is 1.
  • If the counter is not 0: increment if the element is the candidate, decrement if not.
When we are done, the candidate is the majority element, if there is a majority.

Using this specification, we can implement the algorithm:
: majority ( seq -- elt/f )
    [ f 0 ] dip [
        over zero? [ 2nip 1 ] [ 
            pick = [ 1 + ] [ 1 - ] if
        ] if
    ] each zero? [ drop f ] when ;
A few simple tests show that this is working:
[ f ] [ { } majority ] unit-test
[ f ] [ { 1 2 } majority ] unit-test
[ 1 ] [ { 1 1 2 } majority ] unit-test
[ f ] [ { 1 1 2 2 } majority ] unit-test
[ 2 ] [ { 1 1 2 2 2 } majority ] unit-test
This is perhaps not quite idiomatic Factor, can you improve it?

6 comments:

  1. Here's my solution, but not so sure it's idiomatic:

    :: majority2 ( seq -- f/elt )
    0 :> count! f :> candidate!
    seq [ count 0 =
    [ candidate! 1 count! ]
    [ candidate =
    [ count 1 + count! ] [ count 1 - count! ] if
    ] if ] each

    count 0 = [ f ] [ candidate ] if ;

    What's more disturbing to me is that the input { 1 2 3 1 } for both of our implementations of this algorithm output f, instead of 1.

    I don't know why.

    ReplyDelete
  2. I looked at the page that you linked to, and the description matches your own description of the algorithm above too. However, I feel that there's something else missing.

    I found that the input { 1 1 1 2 2 3 3 } also gives a wonky result. Is this algorithm not always 100% (similar perhaps to a bloom filter which has a % change to give a false positives)?

    ReplyDelete
  3. http://stackoverflow.com/questions/780937/linear-time-voting-algorithm-i-dont-get-it

    mrjbq7, I didn't read the assumptions carefully enough! There needs to be a MAJORITY (duh!) element in the entire sequence of at greater than N/2.

    I mistakenly thought that majority meant "highest frequency group", which is a different problem altogether.

    Sorry for any confusion.

    ReplyDelete
  4. @otoburb: I think I just realized why. One of the requirements that I didn't notice is that the sequence has an element that exists more than N/2 times.

    So, its not just the "most common element" (which I had originally thought), but actually the "majority element", meaning more than half the sequence consists of the element.

    ReplyDelete
  5. Just for fun using tuple and reduce.

    Shorter
    [ [ 1 + ] ] [ [ 1 - ] ] if change-n
    fails with Cannot apply “call” to a run-time computed value, sadly.

    ReplyDelete
  6. Here's my crack at it:

    :: (majority) ( candidate count elt -- candidate count )
    {
    { [ count 0 = ] [ elt 1 ] }
    { [ candidate elt = ] [ candidate count 1 + ] }
    [ candidate count 1 - ]
    } cond ;

    : majority ( seq -- elt ) [ f 0 ] dip [ (majority) ] each drop ;

    ReplyDelete