Wednesday, February 27, 2013

Faster "shuffle"

A funny comment on a Reddit discussion of my last post about "shuffle" said:

"Turtle races are fun indeed."

Improvements

Well, that is perhaps true given the performance of Factor, Python, and Ruby compared with implementations in C (or SBCL and LuaJIT). However, I'm happy to say that as of today, the performance of randomize (the standard library "shuffle" word) in Factor is greatly improved.

IN: scratchpad 10,000,000 iota >array
Before:
IN: scratchpad gc [ randomize ] time
Running time: 2.78315852 seconds
After:
IN: scratchpad gc [ randomize ] time
Running time: 1.373153681 seconds

Commits

So, what changed?

Well, while investigating the overhead of Factor versus C, I fixed a few things:

Looking at other programming languages is useful to know where your performance can improve and by approximately how much. Since Factor is an attempt at making a high level language that produces relatively fast code, I hope someday to turn this "turtle" into a "hare". Every little bit helps!

Tuesday, February 26, 2013

Fast "shuffle"

Randomly shuffling the elements in an array is a pretty common task. The standard algorithm for this is called the Fisher-Yates shuffle. The modern version of it looks like this in pseudo-code:

To shuffle an array a of n elements (indices 0..n-1):
  for i from n − 1 downto 1 do
       j ← random integer with 0 ≤ ji
       exchange a[j] and a[i]

We're going to implement this in Factor, and then look at improving the performance when shuffling an array of 10,000,000 numbers (generated with "10,000,000 iota >array"). Afterwards, we are going to look at Ruby and Python versions.

Factor

Version 1:

Our first implementation is a straight-forward translation of the algorithm (a simplification of the randomize word included in the Factor standard library), and looks something like this:

: shuffle1 ( seq -- seq )
    dup length [ dup 1 > ] [
       [ random ] [ 1 - ] bi
       [ pick exchange ] keep
    ] while drop ;

On our test case, this takes 2.972 seconds.

Version 2:

Our second implementation uses exchange-unsafe, which does not perform bounds checks (unnecessary given our loop constraints).

: shuffle2 ( seq -- seq )
    dup length [ dup 1 > ] [
       [ random ] [ 1 - ] bi
       [ pick exchange-unsafe ] keep
    ] while drop ;

On our test case, this takes 2.830 seconds (a 5% improvement on version 1!).

Version 3:

Our third implementation uses the typed vocabulary to provide type information to the compiler to produce more optimal code:

TYPED: shuffle3 ( seq: array -- seq )
    dup length [ dup 1 > ] [
       [ random ] [ 1 - ] bi
       [ pick exchange-unsafe ] keep
    ] while drop ;

On our test case, this takes 2.554 seconds (a 15% improvement on version 1!).

Version 4:

Our fourth implementation instead removes the generic dispatch of calling random as well as the dynamic variable lookup for the current random number generator:

: shuffle4 ( seq -- seq )
    dup length [ dup 1 > ]
    random-generator get '[
       [ _ (random-integer) ] [ 1 - ] bi
       [ pick exchange-unsafe ] keep
    ] while drop ;

On our test case, this takes 2.408 seconds (a 19% improvement on version 1!).

Version 5:

Our fifth implementation combines the optimizations in version 3 and 4:

TYPED: shuffle5 ( seq: array -- seq )
    dup length [ dup 1 > ]
    random-generator get '[
        [ _ (random-integer) ] [ 1 - ] bi
        [ pick exchange-unsafe ] keep
    ] while drop ;

On our test case, this takes 2.254 seconds (a 24% improvement on version 1!).

Version 6:

Our sixth implementation declares that the indices to exchange-unsafe are fixnums (removing a minor check that isn't optimized out by the compiler for some reason):

TYPED: shuffle6 ( seq: array -- seq )
    dup length [ dup 1 > ]
    random-generator get '[
        [ _ (random-integer) ] [ 1 - ] bi
        { fixnum fixnum } declare
        [ pick exchange-unsafe ] keep
    ] while drop ;

On our test case, this takes 2.187 seconds (a 27% improvement on version 1!).

Version 7:

Our seventh implementation is just version 6 with a quick-and-dirty random number generator using rand():

FUNCTION: int rand ( ) ;

SINGLETON: c-random
M: c-random random-32* drop rand ;

: with-c-random ( quot -- )
    [ c-random ] dip with-random ; inline

On our test case, this takes 2.015 seconds (a 32% improvement on version 1!).

Version 8:

Our eighth (and final!) version drops down to C, to implement a primitive version of shuffle:

void factor_vm::primitive_shuffle_array()
{
    array* a = untag_check<array>(ctx->peek());
    cell capacity = array_capacity(a);
    for (cell i = capacity - 1; i > 0; i--) {
        cell j = i + rand() / (RAND_MAX / (capacity - i) + 1);
        cell tmp = array_nth(a, i);
        set_array_nth(a, i, array_nth(a, j));
        set_array_nth(a, j, tmp);
    }
}

On our test case, this takes 0.494 seconds (a 83% improvement on version 1!).

Python

I wanted to see how various Python versions performed, so I wrote a simple version that takes 19.025 seconds on Python 2.7.3, 12.436 seconds on Jython 2.5.3, and 1.392 seconds with PyPy 1.9.0:

from random import randrange
import time

n = 10000000
l = list(xrange(n))
t0 = time.time()
i = n
while i > 1:
    j = randrange(i)
    i -= 1
    l[i], l[j] = l[j], l[i]
print 'took %.3f seconds' % (time.time() - t0)

A version using Numpy takes 3.273 seconds:

import numpy as np
import time

a = np.arange(10000000)
t0 = time.time()
np.random.shuffle(a)
print 'took %.3f seconds' % (time.time() - t0)

Ruby

Curious what Ruby would look like, I tested two versions. The first implements shuffle in "pure" Ruby, taking 149.975 seconds on Ruby 1.8.7, 25.086 seconds in Ruby 1.9.3, 8.054 seconds on MacRuby 0.12, 7.258 seconds on JRuby 1.7.3, and 4.682 seconds on Rubinius 1.2.4:

def shuffle!(a)
    size = a.length
    size.times do |i|
        r = i + Kernel.rand(size - i)
        a[i], a[r] = a[r], a[i]
    end
end

a = (1..10000000).to_a
t0 = Time.new
shuffle!(a)
t1 = Time.new
printf "took %.3f seconds", t1-t0

But since Ruby 1.8.7, Array.shuffle! is a builtin (implemented in C), so lets look at how performance differs. It takes 0.443 seconds in Ruby 1.8.7, 0.554 seconds in Ruby 1.9.3, 0.884 seconds in MacRuby 0.12, 2.170 seconds in JRuby 1.7.3, and 3.479 seconds in Rubinius 1.2.4:

a = (1..10000000).to_a
t0 = Time.new
a = a.shuffle!
t1 = Time.new
printf "took %.3f seconds", t1-t0

Conclusion

C is awesome. PyPy is really fast. Factor can be pretty fast. Numpy should be faster. Jython could be faster. Python isn't very fast at all. Ruby is both blazing fast and slower than snails!

Thursday, February 7, 2013

Typoglycemia

Typoglycemia is the name given to an internet meme that went around a few years ago, purporting to demonstrate that "readers can understand the meaning of words in a sentence even when the interior letters of each word are scrambled".

Can you read this?

...it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoatnt tihng is taht the frist and lsat ltteer be in the rghit pclae.

Well, perhaps it isn't completely true, but it at least is mostly true. How about an implementation of this in Factor?

We will start by building a word to "misspell" a string that is at least four letters long (ignoring any punctuation at the end) and randomizing all the characters except the first and last:

: misspell-word ( word -- word' )
    dup [ ",'.:;!?" member? not ] find-last drop 0 or
    dup 2 > [
        dupd head-slice dup [ Letter? ] all?
        [ rest-slice randomize ] when drop
    ] [ drop ] if ;

Next, we will "misspell" a line of text:

: misspell-line ( line -- line' )
    [ blank? ] split-when [ misspell-word ] map " " join ;

Finally, misspelling a block of text:

: misspell ( string -- string' )
    string-lines [ misspell-line ] map "\n" join ;

You can try it to show it works:

IN: scratchpad "this really works!" misspell .
"this relaly wroks!"

Maybe it would be easier to read if we just randomly swap two letters in the middle of each word, rather than fully randomizing it?

Friday, February 1, 2013

Crafting Code in Factor

An interesting post about crafting code in Clojure discusses how Clojure the language encourages code that is "concise, easier to read, and easier to maintain". I've mentioned before how Factor encourages readability, but I thought I would show how an implementation of the author's task in Factor is quite simple.

The problem statement is to "convert the keys of two maps into a comma-separated string" for nicer error messages. The specific tasks outlined by the author are:

  • Extract all the keys from both maps
  • Remove any duplicates
  • Convert the keys to strings
  • Sort the strings into ascending order
  • Build and return one big string, by concatenating all the key strings, using ", " as a separator
  • Return "<none>" if both maps are empty

Implementation

This specification can be translated more or less directly into Factor:

: sorted-key-list ( assoc1 assoc2 -- string )
    [ keys ] bi@ append members [ "<none>" ] [
        [ present ] map natural-sort ", " join
    ] if-empty ;

Testing

And some unit tests to show it works:

{ "<none>" } [ H{ } H{ } sorted-key-list ] unit-test

{ "barney, fred, wilma" } [
    H{ { "fred" t } }
    H{ { "barney" t } { "wilma" t } }
    sorted-key-list
] unit-test

{ "fred" } [ H{ { "fred" t } } H{ } sorted-key-list ] unit-test

{ "barney, fred" } [
    H{ { "fred" t } }
    H{ { "fred" t } { "barney" t } }
    sorted-key-list
] unit-test

Errata

An improved version of the Clojure version allows a variable number of maps to be combined in this manner. To do so in Factor might use as a macro, or instead just change the stack effect to take a sequence of maps that are combined, something like this:

: sorted-key-list2 ( seq -- string )
    [ keys ] map concat members [ "<none>" ] [
        [ present ] map natural-sort ", " join
    ] if-empty ;