Re: Factor: September 2011

Sunday, September 25, 2011

Approximating Pi

A few days ago, it was announced on the Wolfram Blog that a 13-year-old had made a record calculating 458 million terms for the continued fraction of pi. In the spirit of that, I thought I would show how to solve a question that sometimes gets asked at interviews:

Given that Pi can be estimated using the function 4 * (1 - 1/3 + 1/5 - 1/7 + ...) with more terms giving greater accuracy, write a function that calculates Pi to an accuracy of 5 decimal places.

Using Factor, we can calculate the nth approximation of pi using vector arithmetic:

: approximate-pi ( n -- approx )
    [1,b] 2 v*n 1 v-n 1 swap n/v
    [ odd? [ neg ] when ] map-index sum 4 * ;

This isn't ideal if we want to try an increasing number of terms (looking for a particularly accuracy), since a lot of the the work would be redone unnecessarily. Instead, we can write a word that adds successive terms until the difference between the previous approximation and the current approximation is less than our requested accuracy.

: next-term ( approx i -- approx' )
    [ 2 * 1 + ] [ odd? [ neg ] when ] bi 4.0 swap / + ; inline

:: find-pi-to ( accuracy -- n approx )
    1 4.0 [
        dup pick next-term [ - ] keep
        swap abs accuracy >= [ 1 + ] 2dip
    ] loop ;

To show its performance, we can time it:

( scratchpad ) [ 0.00001 find-pi-to ] time .
Running time: 0.026030341 seconds

3.141597653564762

An equivalent function in Python might look like this:

def find_pi_to(accuracy):
    i = 1
    approx = 4.0
    while 1:
        term = (2 * i) + 1
        if i % 2 == 1:
            term = -term
        new = approx + 4.0/term
        if abs(new - approx) < accuracy:
            approx = new
            break
        i += 1
        approx = new
    return approx

But, if we time this version (not counting startup or compile time), it takes 0.134 seconds. Doing the math shows that Factor is 5 times faster than Python in this case. Not bad.

Thursday, September 22, 2011

Really Big Numbers

Factor supports both fixnum (fixed size integers, typically 32- or 64-bit values) and bignum (arbitrarily large integers). Recently, I discovered that Factor did not have support for calculating the logarithm of really big numbers (those larger than 2¹⁰²⁴).

You can define a simple factorial function:

: factorial ( n -- n! )
    [ 1 ] [ [1,b] product ] if-zero ;

But if you tried to calculate the logarithm of 1000 factorial, it produces the wrong answer.

( scratchpad ) 1000 factorial log .
1/0.

The reason for this is that Factor attempts to convert a bignum into a double-precision floating point number and take the logarithm of that. Unfortunately, the value in this case is too large. What do other languages do in this case?

We could look at Ruby, but it has the same problem as Factor:

>> Math::log((1..1000).inject(:*))
(irb):8: warning: Bignum out of Float range
=> Infinity

However, you can get the right answer in Python:

>>> def factorial(n):
...    r = 1
...    while n > 0:
...        r *= n
...        n -= 1
...    return r
...
>>> math.log(factorial(1000))
5912.128178488163

If you look under the covers, you will see that Python handles this case by calling frexp to split a value into a fraction (x) and a power of two (exp). The original value can be calculated as x*2^exp. Using this, the logarithm can be computed as log(x) + log(2) * exp.

After discussing this on #concatenative, Joe Groff and I came up with a solution for this. I'm not going to go over all the details, but if you're curious, you can look at the discussion.

First, we implemented a cross-platform version of frexp:

GENERIC: frexp ( x -- y exp )

M: float frexp
    dup fp-special? [ dup zero? ] unless* [ 0 ] [
        double>bits
        [
            HEX: 800f,ffff,ffff,ffff bitand
            0.5 double>bits bitor bits>double
        ] [ -52 shift HEX: 7ff bitand 1022 - ] bi
    ] if ; inline

M: integer frexp
    [ 0.0 0 ] [
        dup 0 > [ 1 ] [ abs -1 ] if swap dup log2 [
            52 swap - shift HEX: 000f,ffff,ffff,ffff bitand
            0.5 double>bits bitor bits>double
        ] [ 1 + ] bi [ * ] dip
    ] if-zero ; inline

Next, we added support for log and log10 of bignum. If the number can be represented as a float, we continue to process it as before, but if it is larger, we calculate it similar to Python (with some caching of the log(2) and log10(2) values for performance):

: most-negative-finite-float ( -- x )
    HEX: -1.ffff,ffff,ffff,fp1023 >integer ; inline
: most-positive-finite-float ( -- x )
    HEX:  1.ffff,ffff,ffff,fp1023 >integer ; inline

CONSTANT: log-2   HEX: 1.62e42fefa39efp-1
CONSTANT: log10-2 HEX: 1.34413509f79ffp-2

: (representable-as-float?) ( x -- ? )
    most-negative-finite-float
    most-positive-finite-float between? ; inline

: (bignum-log) ( n log-quot: ( x -- y ) log-2 -- log )
    [ dup ] dip '[
        dup (representable-as-float?)
        [ >float @ ] [ frexp [ @ ] [ _ * ] bi* + ] if
    ] call ; inline

M: bignum log [ log ] log-2 (bignum-log) ;

M: bignum log10 [ log10 ] log10-2 (bignum-log) ;

And now, in the listener you can get the answer!

( scratchpad ) 1000 factorial log .
5912.128178488163

This change is now in the Factor repository (if you'd like to update), and will be in the next release.

Tuesday, September 20, 2011

Enigma Machines

I noticed a fun blog post about recreating the Enigma machine in Python. For those who need a refresher, an Enigma machine was an encryption device used before and during World War II. I thought it would be fun to implement this in Factor.

Build It

Our alphabet will be the 26 letters in the English alphabet, indexed from 0 to 25 ("a" to "z"):

: <alphabet> ( -- seq )
    26 iota >array ;

The Enigma machine is made up of a number of "cogs", which contain an encoding. We will create a cog using a randomized alphabet:

: <cog> ( -- cog )
    <alphabet> randomize ;

We need to create a utility function to remove a random element from a sequence:

: remove-random ( seq -- elt seq' )
    [ length random ] keep [ nth ] [ remove-nth ] 2bi ;

A special rotor called a "reflector" connected pairs of outputs, ensuring that encryption and decryption were the same process. It also gave the Enigma machine the property that no letter encrypted to itself (a flaw which was used to help break the code). We can create a reflector by taking the alphabet, and randomly exchanging pairs of elements:

: <reflector> ( -- reflector )
    <alphabet> dup length iota [ dup empty? ] [
        remove-random remove-random [ pick exchange ] dip
    ] until drop ;

We can now create an Enigma machine with a number of cogs and a reflector:

TUPLE: enigma cogs prev-cogs reflector ;

: <enigma> ( num-cogs -- enigma )
    [ <cog> ] replicate dup clone <reflector> enigma boa ;

We need a way to check for special characters (those not in the 26 letter alphabet):

: special? ( n -- ? )
    [ 25 > ] [ 0 < ] bi or ;

We need a utility function to change several elements of a sequence at the same time:

: change-nths ( indices seq quot: ( elt -- elt' ) -- )
    [ change-nth ] 2curry each ; inline

Encoding a piece of text is where all the work is performed. The main strategy is to apply each character (that isn't special) through the cogs and then the reflector to find the encoded character, and then cycle the cogs to prepare for the next character.

:: encode ( text enigma -- cipher-text )
    0 :> ln!
    enigma cogs>> :> cogs
    enigma reflector>> :> reflector
    text >lower [
        CHAR: a mod dup special? [
            ln 1 + ln!
            cogs [ nth ] each reflector nth
            cogs reverse [ index ] each CHAR: a +
            cogs length iota [ 6 * 1 + ln mod zero? ] filter
            cogs [ unclip prefix ] change-nths
        ] unless
    ] map ;

Note: this converts the text to lowercase before encoding. The encode word could be expanded to support uppercase, but its much simpler this way.

If we make a word to reset the cogs back to the original configuration, we can verify that the encrypted text can be decrypted back to the original.

: reset-cogs ( enigma -- enigma )
    dup prev-cogs>> >>cogs ;

Try It

We can experiment with this in the Listener, creating a 4-cog Enigma machine and then encoding, resetting, and encoding again:

( scratchpad ) 4 <enigma>

( scratchpad ) "hello, world" over encode [ print ] keep
luhhn, xnzha

( scratchpad ) over reset-cogs encode print
hello, world

The code for this is on my Github.

Sunday, September 18, 2011

Most Pressed Keys

Mahdi Yusuf made a fun blog post about which keys get pressed the most in various programming languages. I thought I would generate one for Factor:

This was assembled by taking all the source code (not documentation or tests) from core/ and basis/ in the Factor codebase and passing it through heatmap.js.

Sunday, September 11, 2011

Manipulating Files

Java has had historically frustrating API's for interacting with files. In Java 7, these API's were cleaned up a little bit. A blog post demonstrates some examples of using these new API's. I would say, though, that Factor demonstrates a much simpler cross-platform API:

Creating and Deleting Files

To create a file, simply:

( scratchpad ) "/path/to/file" touch-file

To update file permissions (on unix systems), you can use the io.files.info.unix vocabulary:

( scratchpad ) "/path/to/file" OCT: 666 set-file-permissions

Note: it would be nice to support parsing "rw-rw-rw-" and "g+x" and similar permission strings, and probably not very difficult.

To delete a file, simply:

( scratchpad ) "/path/to/file" delete-file

Copying and Moving Files

To copy a file from one path to another:

( scratchpad ) "/path/from" "/path/to" copy-file

To copy a file, and preserve its file permissions:

( scratchpad ) "/path/from" "/path/to" copy-file-and-info

To copy a file into a directory:

( scratchpad ) "/dir1/file" "/dir2" copy-file-into

To move a file from one path to another:

( scratchpad ) "/path/from" "/path/to" move-file

To move a file into a directory:

( scratchpad ) "/dir1/file" "/dir2" move-file-into

Tuesday, September 6, 2011

echo

One of the most basic unix utilities is echo, which is used to "display a line of text". Below, I show how to implement this in Factor.

The usage for echo is usually written like this:

echo [-n] [string ...]

The -n option is to "not print the trailing newline character". We can make a word that checks the first argument for -n, and returns a boolean and the remaining arguments for formatting:

: -n? ( args -- ? args' )
    [ first "-n" = ] keep over [ rest ] when ;

Since the string arguments are separated by a single blank space, we can join and write them, optionally printing a trailing newline.

: echo-args ( args -- )
    -n? " " join write [ nl ] unless ;

We can define a "main" word that allows our vocabulary to be directly run or deployed:

: run-echo ( -- )
    command-line get [ nl ] [ echo-args ] if-empty ;

MAIN: run-echo

The code for this is on my Github.

Saturday, September 3, 2011

Fun with WAV

Last year, a "Fun with wav" post got a lot of visibility. The author was trying to extract the header from a audio file in the WAV format and output the sum of the remaining data in the file. He gives the disclaimer several times that this was a specific hack and not a generalized solution.

Although the original solution was in C, he received other possible solutions on Reddit. The "winner" by code golf rules is probably Ruby:

data = ARGF.read
keys = %w[id totallength wavefmt format
          pcm channels frequency bytes_per_second
          bytes_by_capture bits_per_sample
          data bytes_in_data sum
]
values = data.unpack 'Z4 i Z8 i s s i i s s Z4 i s*'
sum = values.drop(12).map(&:abs).inject(:+)
keys.zip(values.take(12) << sum) {|k, v|
      puts "#{k.ljust 17}: #{v}"
}

Build It

While not attempting to "golf", I wanted to show how this might be implemented in Factor. First, some imports:

USING: alien.c-types classes.struct kernel io
io.encodings.binary io.files math specialized-arrays ;

FROM: sequences => map-sum ;

IN: wavsum

Each WAV file begins with a "master RIFF chunk" followed by format information and the sampled data. We could read each field specifically, or we can capture this header information directly into a packed structure (I added support for these in January and recently merged it into the main Factor repository).

PACKED-STRUCT: header
    { id char[4] }
    { totallength int }
    { wavefmt char[8] }
    { format int }
    { pcm short }
    { channels short }
    { frequency int }
    { bytes_per_second int }
    { bytes_by_capture short }
    { bits_per_sample short }
    { data char[4] }
    { bytes_in_data int } ;

We can easily read from an input stream directly into this structure:

: read-header ( -- header )
    header [ heap-size read ] [ memory>struct ] bi ;

The original solution then produced a sum of the remaining file, treated as a sequence of shorts (16-bit integers).

SPECIALIZED-ARRAY: short

: sum-contents ( -- sum )
    contents short-array-cast [ abs ] map-sum ;

Producing a "wavsum" from a file:

: wavsum ( path -- header sum )
    binary [ read-header sum-contents ] with-file-reader ;

Try It

We can try it on a sample wav file that I included with the vocabulary and we get the same output as the Ruby and C versions:

( scratchpad ) "vocab:wavsum/truck.wav" wavsum [ . ] bi@
S{ header
    { id char-array{ 82 73 70 70 } }
    { totallength 66888 }
    { wavefmt char-array{ 87 65 86 69 102 109 116 32 } }
    { format 50 }
    { pcm 2 }
    { channels 1 }
    { frequency 22050 }
    { bytes_per_second 10752 }
    { bytes_by_capture 512 }
    { bits_per_sample 4 }
    { data char-array{ 32 0 -12 3 } }
    { bytes_in_data 16777223 }
}
392717699

It might be useful to add some validation to this example (much like the original C version) for such things as endianness and the 16-bit WAV format. Alternatively, we could improve it to be more general to handle 8-bit or 24-bit encodings, as well as other header formats (not just the "extended WAV" format).

The code for this is on my Github.