Tuesday, February 21, 2012

TXON

The TXON, also known as "Text Object Notation", is a proposed format for structured data.

Much less popular than other formats such as JSON, XML, or even INI files - I thought it would still be fun to implement encode and decode words in Factor.

An example TXON might look something like this:

Factor:`
    url:`http://factorcode.org`
    development:`Started in 2003`
    license:`Open source (BSD license)`
    influences:`Forth, Lisp, and Smalltalk`
`

Encoding

Since TXON uses "`" characters to delimit values, we need to escape them:

: encode-value ( string -- string' )
    R" `" "\\`" re-replace ;

To implement encoding in a generic way, we dispatch on the type of object being encoded:

GENERIC: >txon ( object -- string )

M: sequence >txon
    [ >txon ] map "\n" join ;

M: assoc >txon
    >alist [
        first2 [ encode-value ] [ >txon ] bi* "%s:`%s`" sprintf
    ] map "\n" join ;

M: string >txon
    encode-value ;

M: number >txon
    number>string >txon ;

Decoding

Although the TXON specification includes an EBNF grammar, I am going to show one way to build a parser from scratch. In the tradition of concatenative languages, we will build our decoder from several smaller words.

For symmetry with the encode-value word, we need a way to unescape the ` characters:

: decode-value ( string -- string' )
    R" \\`" "`" re-replace ;

Since the TXON format is a series of name:`value` pairs, we can parse the name by finding the separator and then decoding the name (which might contain escaped characters):

: parse-name ( string -- remain name )
    ":`" split1 swap decode-value ;

To build a word that finds the first (unescaped) ` character, we will first make a word that looks at adjacent characters, returning true if the second character is an unescaped `:

: `? ( ch1 ch2 -- ? )
    [ CHAR: \ = not ] [ CHAR: ` = ] bi* and ;

By grouping the string into adjacent characters, we can find the first unescaped ` (specially handling the case where the first character is an `):

: (find-`) ( string -- n/f )
    2 clump [ first2 `? ] find drop [ 1 + ] [ f ] if* ;

: find-` ( string -- n/f )
    dup ?first CHAR: ` = [ drop 0 ] [ (find-`) ] if ;

Parsing the value is slightly complicated by the fact that TXON supports values which might themselves be a single value, a sequence of values, or a series of name/value pairs. Basically, that means we need to:

  1. find the first ` character
  2. checks if the previous character is a : (indicating a name/value)
  3. parse all name/values if so, otherwise decode the value(s)

That algorithm can be translated into this code:

DEFER: name/values

: (parse-value) ( string -- values )
    decode-value string-lines dup length 1 = [ first ] when ;

: parse-value ( string -- remain value )
    dup find-` [
        dup 1 - pick ?nth CHAR: : =
        [ drop name/values ] [ cut swap (parse-value) ] if
        [ rest [ blank? ] trim-head ] dip
    ] [ f swap ] if* ;

We want to parse a "name=value" pair, which should be as easy as parsing the name, then the value, then associating into a hashtable:

: (name=value) ( string -- remain term )
    parse-name [ parse-value ] dip associate ;

The string might contain a "name=value" pair, or just a single value:

: name=value ( string -- remain term )
    [ blank? ] trim
    ":`" over subseq? [ (name=value) ] [ f swap ] if ;

We finish by building a word to produce all "name=value" pairs, used in the parse-value word earlier.

: name/values ( string -- remain terms )
    [ dup { [ empty? not ] [ first CHAR: ` = not ] } 1&& ]
    [ name=value ] produce assoc-combine ;

Putting all of that together, we can make a word to parse a TXON string, producing "name=value" pairs until exhausted:

: parse-txon ( string -- objects )
    [ dup empty? not ] [ name=value ] produce nip ;

: txon> ( string -- object )
    parse-txon dup length 1 = [ first ] when ;

Try It

You can try this out in the listener:

IN: scratchpad H{ { "a" "123" } } >txon .
"a:`123`"

IN: scratchpad "a:`123`" txon> .
H{ { "a" "123" } }

Can you improve on this? Maybe by using the peg.ebnf vocabulary to create an EBNF parsing word?

The code for this (and a bunch of tests) are on my Github.

Monday, February 13, 2012

Readability

James O'Beirne wrote a great blog post on why languages matter with some thoughts on predictability, readability, and compactness. In it, he compares some examples of code in dynamic languages such as PHP, Python, and Groovy.

I wanted to compare his simple "readability" examples with Factor, to show why concatenative programming matters.

PHP

This example of some PHP code:

<?php
$x = 1;
$nums = array(10, 20, 30, 40);
$res = 0;

foreach ($nums as $n)
  if ($n > 15)
    $res -= $n*2 + $x;

Groovy

He compares the PHP code favorably to this Groovy code:

def x = 1
def nums = [10, 20, 30, 40]
def res = nums.findAll { it > 15 } 
    .collect { it * 2 + x } 
    .inject(0) {accum, val -> accum - val}

Python

Although James doesn't show a Python example, it could look something like this:

x = 1
nums = [10,20,30,40]
res = 0
for n in nums:
    if n > 15:
        res -= (n*2) + x

We could improve this by using a generator expression (or, equivalently, a list comprehension)):

>>> def foo(x, nums):
...     return -sum(n*2+x for n in nums if n > 15)
... 

>>> foo(1, [10,20,30,40])
-183

Factor

I would argue that a Factor version is pretty readable:

IN: scratchpad { 10 20 30 40 } [ 15 > ] filter
               [ 2 * 1 + ] map sum neg .
-183

If you wanted to factor (ahem) this into a reusable word:

: foo ( x nums -- res )
    [ 15 > ] filter [ 2 * + ] with map sum neg ;

IN: scratchpad 1 { 10 20 30 40 } foo .
-183

Thursday, February 2, 2012

copy

I've used Factor to build several common unix programs including cat, fortune, wc, and others.

Today, I wanted to show how to build the cp ("copy") program using the simple file manipulation words. If we look at the man page, we can see that its usage is two-fold:

  1. Copy several source files to a destination directory
  2. Copy a source file to a destination file or directory

We can make a nice usage string to display if the arguments are not correct:

: usage ( -- )
    "Usage: copy source ... target" print ;

We can implement the first usage, copy-to-dir, by checking to see that the destination is a directory before calling copy-files-into, or printing the usage if it is not:

: copy-to-dir ( args -- )
    dup last file-info directory?
    [ unclip-last copy-files-into ] [ drop usage ] if ;

The second usage, copy-to-file, first checks if the destination exists and is a directory (if so calling our copy-to-dir word), otherwise calling copy-file:

: copy-to-file ( args -- )
    dup last { [ exists? ] [ file-info directory? ] } 1&&
    [ copy-to-dir ] [ first2 copy-file ] if ;

Putting it all together, we can implement our program by checking the number of arguments and assuming the two-argument version is copy-to-file and more arguments are copy-to-dir (anything less gets the usage):

: run-copy ( -- )
    command-line get dup length {
        { [ dup 2 > ] [ drop copy-to-dir  ] }
        { [ dup 2 = ] [ drop copy-to-file ] }
        [ 2drop usage ]
    } cond ;

MAIN: run-copy

The code for this is on my Github.

Wednesday, January 25, 2012

Colored Timestamps

I noticed a fun post in early December that implements a mapping between current time and a "unique" RGBA color. I thought it might be fun to use Factor to implement a colored clock.

The basic concept is to map the 4,294,967,296 unique RGBA colors to seconds, which gives just over 136 years of unique colors.

timestamp>rgba

We calculate timestamps as an offset from Dennis Ritchie's birthday:

: start-date ( -- timestamp )
    1941 9 9 <date> ; inline

The offset is an elapsed number of seconds from the start date:

: elapsed ( timestamp -- seconds )
    start-date time- duration>seconds >integer ;

The conversion from a timestamp into a unique RGBA color does successive divmod operations to map into Red, Green, Blue, and Alpha values:

: timestamp>rgba ( timestamp -- color/f )
    elapsed dup 0 32 2^ between? [
        24 2^ /mod 16 2^ /mod 8 2^ /mod
        [ 255 /f ] 4 napply <rgba>
    ] [ drop f ] if ;

You can try it for yourself, showing how the values change over time:

IN: scratchpad start-date timestamp>rgba .
T{ rgba
    { red 0.0 }
    { green 0.0 }
    { blue 0.0 }
    { alpha 0.0 }
}

IN: scratchpad now timestamp>rgba .
T{ rgba
    { red 0.5176470588235295 }
    { green 0.3803921568627451 }
    { blue 0.4313725490196079 }
    { alpha 0.3333333333333333 }
}

<rgba-clock>

Let's use the timestamp>rgba word to make an updating "colored clock". Specifically, we can use an arrow model to update a label every second to create an RGBA clock:

: update-colors ( color label -- )
    [ font>> background<< ]
    [ [ <solid> ] dip [ interior<< ] [ boundary<< ] 2bi ]
    2bi ;

: <rgba-clock> ( -- gadget )
    f <label-control>
        time get over '[
            [ timestamp>rgba _ update-colors ]
            [ timestamp>hms ] bi
        ] <arrow> >>model
        "HH:MM:SS" >>string
        monospace-font >>font ;

Use the gadget. word to try it in your listener, and watch it update:

IN: scratchpad <rgba-clock> gadget.

The code for this is on my Github.

Friday, January 13, 2012

Friday the 13th

In honor of January 13, 2012, a Friday the 13th, I thought it might be fun to use Factor to explore similar dates in past and future history. According to Wikipedia, such a day "occurs at least once, but at most three times a year".

friday-13th?

A day is "Friday the 13th" if it is both (a) Friday and (b) the 13th:

: friday-13th? ( timestamp -- ? )
    [ day>> 13 = ] [ friday? ] bi and ;

Trying it for today and tomorrow, to make sure it works:

IN: scratchpad now friday-13th? .
t

IN: scratchpad : tomorrow ( -- timestamp )
                   now 1 days time+ ;

               tomorrow friday-13th? .
f

friday-13ths

Getting all Friday the 13th's for a given year:

: friday-13ths ( year -- seq )
    12 [0,b) [
        13 <date> dup friday? [ drop f ] unless
    ] with map sift ;

Or, for a range of years:

: all-friday-13ths ( start-year end-year -- seq )
    [a,b] [ friday-13ths ] map concat ;

Trying it for 2012:

IN: scratchpad 2012 friday-13ths .
{
    T{ timestamp
        { year 2012 }
        { month 1 }
        { day 13 }
    }
    T{ timestamp
        { year 2012 }
        { month 4 }
        { day 13 }
    }
    T{ timestamp
        { year 2012 }
        { month 7 }
        { day 13 }
    }
}

next-friday-13th

We can iterate, looking for the next Friday the 13th:

: next-friday-13th ( timestamp -- date )
    dup day>> 13 >= [ 1 months time+ ] when 13 >>day
    [ dup friday? not ] [ 1 months time+ ] while ;

Trying it for today, shows the next Friday the 13th is April, 13, 2012:

IN: scratchpad now next-friday-13th .
T{ timestamp
    { year 2012 }
    { month 4 }
    { day 13 }
}

The code (and some tests) for this is on my Github.

Tuesday, January 3, 2012

Duplicate Files

A few months ago, Jon Cooper wrote a duplicate file checker in Go and Ruby.

Below, I contribute a simple version in Factor that runs faster than both Go and Ruby solutions. In the spirit of the original article, I have separated the logic into steps.

Argument Parsing

The command-line vocabulary gives us the arguments passed to the script. We check for the verbose flag and the root directory to traverse:

: arg? ( name args -- args' ? )
    2dup member? [ remove t ] [ nip f ] if ;

: parse-args ( -- verbose? root )
    "--verbose" command-line get arg? swap first ;

Filesystem Traversal

We can traverse the filesystem with the each-file word (choosing breadth-first instead of depth-first). In our case, we want to collect these files into a map of all paths that share a common filename:

: collect-files ( path -- assoc )
    t H{ } clone [
        '[ dup file-name _ push-at ] each-file
    ] keep ;

Our duplicate files are those files that share a common filename:

: duplicate-files ( path -- dupes )
    collect-files [ nip length 1 > ] assoc-filter! ;

MD5 Hashing Files

Using the checksums.md5 vocabulary, it is quite simple:

: md5-file ( path -- string )
    md5 checksum-file hex-string ;

Printing Results

If verbose is selected, then we print each filename and the MD5 checksum for each full path:

: print-md5 ( name paths -- )
    [ "%s:\n" printf ] [
        [ dup md5-file "  %s\n    %s\n" printf ] each
    ] bi* ;

We put this all together by calculating the possible duplicate files, optionally printing verbose MD5 checksums, and then print the total number of duplicates detected:

: run-dupe ( -- )
    parse-args duplicate-files swap
    [ dup [ print-md5 ] assoc-each ] when
    assoc-size "Total duped files found: %d\n" printf ;

Performance

I tested performance using two directory trees, one with over 500 files and another with almost 36,000 files. While the original article focuses more on syntax than speed, it is nice to see that the Factor solution is faster than the Go and Ruby versions.

DuplicatesFactorGoRuby
5831.4532.2983.861
35,95319.08424.45230.597

The above time is seconds on my laptop.

The code for this is on my Github.

Saturday, December 31, 2011

Picomath

The Picomath project holds some reusable math functions inspired by John D. Cook's Stand-alone code for numerical computing, including:

  • Error function
  • Phi (standard normal CDF)
  • Phi inverse
  • Gamma
  • Log Gamma
  • exp(x) - 1 (for small x)
  • log(n!)

These functions are implemented in an impressive list of languages: Ada, C++, C#, D, Erlang, Go, Haskell, Java, Javascript, Lua, Pascal, Perl, PHP, Python (2.x and 3.x), Ruby, Scheme, and Tcl.

And now Factor!

You can find the code (and a bunch of tests) for this on my Github.