Tuesday, February 21, 2012

TXON

The TXON, also known as "Text Object Notation", is a proposed format for structured data.

Much less popular than other formats such as JSON, XML, or even INI files - I thought it would still be fun to implement encode and decode words in Factor.

An example TXON might look something like this:

Factor:`
    url:`http://factorcode.org`
    development:`Started in 2003`
    license:`Open source (BSD license)`
    influences:`Forth, Lisp, and Smalltalk`
`

Encoding

Since TXON uses "`" characters to delimit values, we need to escape them:

: encode-value ( string -- string' )
    R" `" "\\`" re-replace ;

To implement encoding in a generic way, we dispatch on the type of object being encoded:

GENERIC: >txon ( object -- string )

M: sequence >txon
    [ >txon ] map "\n" join ;

M: assoc >txon
    >alist [
        first2 [ encode-value ] [ >txon ] bi* "%s:`%s`" sprintf
    ] map "\n" join ;

M: string >txon
    encode-value ;

M: number >txon
    number>string >txon ;

Decoding

Although the TXON specification includes an EBNF grammar, I am going to show one way to build a parser from scratch. In the tradition of concatenative languages, we will build our decoder from several smaller words.

For symmetry with the encode-value word, we need a way to unescape the ` characters:

: decode-value ( string -- string' )
    R" \\`" "`" re-replace ;

Since the TXON format is a series of name:`value` pairs, we can parse the name by finding the separator and then decoding the name (which might contain escaped characters):

: parse-name ( string -- remain name )
    ":`" split1 swap decode-value ;

To build a word that finds the first (unescaped) ` character, we will first make a word that looks at adjacent characters, returning true if the second character is an unescaped `:

: `? ( ch1 ch2 -- ? )
    [ CHAR: \ = not ] [ CHAR: ` = ] bi* and ;

By grouping the string into adjacent characters, we can find the first unescaped ` (specially handling the case where the first character is an `):

: (find-`) ( string -- n/f )
    2 clump [ first2 `? ] find drop [ 1 + ] [ f ] if* ;

: find-` ( string -- n/f )
    dup ?first CHAR: ` = [ drop 0 ] [ (find-`) ] if ;

Parsing the value is slightly complicated by the fact that TXON supports values which might themselves be a single value, a sequence of values, or a series of name/value pairs. Basically, that means we need to:

  1. find the first ` character
  2. checks if the previous character is a : (indicating a name/value)
  3. parse all name/values if so, otherwise decode the value(s)

That algorithm can be translated into this code:

DEFER: name/values

: (parse-value) ( string -- values )
    decode-value string-lines dup length 1 = [ first ] when ;

: parse-value ( string -- remain value )
    dup find-` [
        dup 1 - pick ?nth CHAR: : =
        [ drop name/values ] [ cut swap (parse-value) ] if
        [ rest [ blank? ] trim-head ] dip
    ] [ f swap ] if* ;

We want to parse a "name=value" pair, which should be as easy as parsing the name, then the value, then associating into a hashtable:

: (name=value) ( string -- remain term )
    parse-name [ parse-value ] dip associate ;

The string might contain a "name=value" pair, or just a single value:

: name=value ( string -- remain term )
    [ blank? ] trim
    ":`" over subseq? [ (name=value) ] [ f swap ] if ;

We finish by building a word to produce all "name=value" pairs, used in the parse-value word earlier.

: name/values ( string -- remain terms )
    [ dup { [ empty? not ] [ first CHAR: ` = not ] } 1&& ]
    [ name=value ] produce assoc-combine ;

Putting all of that together, we can make a word to parse a TXON string, producing "name=value" pairs until exhausted:

: parse-txon ( string -- objects )
    [ dup empty? not ] [ name=value ] produce nip ;

: txon> ( string -- object )
    parse-txon dup length 1 = [ first ] when ;

Try It

You can try this out in the listener:

IN: scratchpad H{ { "a" "123" } } >txon .
"a:`123`"

IN: scratchpad "a:`123`" txon> .
H{ { "a" "123" } }

Can you improve on this? Maybe by using the peg.ebnf vocabulary to create an EBNF parsing word?

The code for this (and a bunch of tests) are on my Github.

Monday, February 13, 2012

Readability

James O'Beirne wrote a great blog post on why languages matter with some thoughts on predictability, readability, and compactness. In it, he compares some examples of code in dynamic languages such as PHP, Python, and Groovy.

I wanted to compare his simple "readability" examples with Factor, to show why concatenative programming matters.

PHP

This example of some PHP code:

<?php
$x = 1;
$nums = array(10, 20, 30, 40);
$res = 0;

foreach ($nums as $n)
  if ($n > 15)
    $res -= $n*2 + $x;

Groovy

He compares the PHP code favorably to this Groovy code:

def x = 1
def nums = [10, 20, 30, 40]
def res = nums.findAll { it > 15 } 
    .collect { it * 2 + x } 
    .inject(0) {accum, val -> accum - val}

Python

Although James doesn't show a Python example, it could look something like this:

x = 1
nums = [10,20,30,40]
res = 0
for n in nums:
    if n > 15:
        res -= (n*2) + x

We could improve this by using a generator expression (or, equivalently, a list comprehension)):

>>> def foo(x, nums):
...     return -sum(n*2+x for n in nums if n > 15)
... 

>>> foo(1, [10,20,30,40])
-183

Factor

I would argue that a Factor version is pretty readable:

IN: scratchpad { 10 20 30 40 } [ 15 > ] filter
               [ 2 * 1 + ] map sum neg .
-183

If you wanted to factor (ahem) this into a reusable word:

: foo ( x nums -- res )
    [ 15 > ] filter [ 2 * + ] with map sum neg ;

IN: scratchpad 1 { 10 20 30 40 } foo .
-183

Thursday, February 2, 2012

copy

I've used Factor to build several common unix programs including cat, fortune, wc, and others.

Today, I wanted to show how to build the cp ("copy") program using the simple file manipulation words. If we look at the man page, we can see that its usage is two-fold:

  1. Copy several source files to a destination directory
  2. Copy a source file to a destination file or directory

We can make a nice usage string to display if the arguments are not correct:

: usage ( -- )
    "Usage: copy source ... target" print ;

We can implement the first usage, copy-to-dir, by checking to see that the destination is a directory before calling copy-files-into, or printing the usage if it is not:

: copy-to-dir ( args -- )
    dup last file-info directory?
    [ unclip-last copy-files-into ] [ drop usage ] if ;

The second usage, copy-to-file, first checks if the destination exists and is a directory (if so calling our copy-to-dir word), otherwise calling copy-file:

: copy-to-file ( args -- )
    dup last { [ exists? ] [ file-info directory? ] } 1&&
    [ copy-to-dir ] [ first2 copy-file ] if ;

Putting it all together, we can implement our program by checking the number of arguments and assuming the two-argument version is copy-to-file and more arguments are copy-to-dir (anything less gets the usage):

: run-copy ( -- )
    command-line get dup length {
        { [ dup 2 > ] [ drop copy-to-dir  ] }
        { [ dup 2 = ] [ drop copy-to-file ] }
        [ 2drop usage ]
    } cond ;

MAIN: run-copy

The code for this is on my Github.