You might remember that I implemented a bunch of factorial words several months ago.

Recently, I was looking at Python's latest mathmodule.c, and noticed an implementation of a "divide-and-conquer factorial algorithm". The algorithm is based on one described by Peter Luschny as the binary-split formula of the factorial of n. It seemed like a fun thing to implement and benchmark in Factor.

It turns out this was one of the improvements made during the Python 3.2 development cycle. Calculating the 50,000th factorial takes **0.724 seconds** with Python 2.7.5 and only **0.064 seconds** with Python 3.3.2 - a nice improvement!

## slow-factorial

The most basic factorial would just multiply the numbers from `1`

to `n`

:

: slow-factorial ( n -- n! ) dup 1 > [ [1,b] 1 [ * ] reduce ] [ drop 1 ] if ;

Calculating the 50,000 factorial this way takes **0.915 seconds**.

## factorial

It turns out that doing that causes a lot of arbitrary-precision bignum multiplication. We can instead use binary-reduce (used by the standard library product and sum words), which is a generic algorithm that recursively multiplies groups of numbers which allows for more fixnum multiplications.

: factorial ( n -- n! ) dup 1 > [ [1,b] 1 [ * ] binary-reduce ] [ drop 1 ] if ;

Calculating the 50,000 factorial takes **0.217 seconds**.

## fast-factorial

We can implement Peter Luschny's algorithm, keeping the structure very close to his sample code:

:: part-product ( n m -- x ) { { [ m n 1 + <= ] [ n ] } { [ m n 2 + = ] [ n m * ] } [ n m + 2 /i dup even? [ 1 - ] when :> k n k k 2 + m [ part-product ] 2bi@ * ] } cond ; :: factorial-loop ( n p r -- p' r' ) n 2 > [ n 2 /i :> mid mid p r factorial-loop [ mid 1 + mid 1 bitand + n 1 - n 1 bitand + part-product * ] [ dupd * ] bi* ] [ p r ] if ; : fast-factorial ( x -- n ) [ 1 1 factorial-loop nip ] [ dup bit-count - ] bi shift ;

Calculating the 50,000 factorial now only takes **0.171 seconds**!

The code for this is on my GitHub.

*Note: there are other algorithms that are likely faster, but for this purpose I wanted to implement a similar algorithm to Python. Their implementation has a few modifications including a tight loop for multiplications that fit into an unsigned long and is over 150 lines of code.*

## No comments:

Post a Comment