The TXON, also known as "Text Object Notation", is a proposed format for structured data.
Much less popular than other formats such as JSON, XML, or even INI files - I thought it would still be fun to implement encode and decode words in Factor.
An example TXON might look something like this:
Factor:`
url:`http://factorcode.org`
development:`Started in 2003`
license:`Open source (BSD license)`
influences:`Forth, Lisp, and Smalltalk`
`
Encoding
Since TXON uses "`" characters to delimit values, we need to escape them:
: encode-value ( string -- string' ) R" `" "\\`" re-replace ;
To implement encoding in a generic way, we dispatch on the type of object being encoded:
GENERIC: >txon ( object -- string ) M: sequence >txon [ >txon ] map "\n" join ; M: assoc >txon >alist [ first2 [ encode-value ] [ >txon ] bi* "%s:`%s`" sprintf ] map "\n" join ; M: string >txon encode-value ; M: number >txon number>string >txon ;
Decoding
Although the TXON specification includes an EBNF grammar, I am going to show one way to build a parser from scratch. In the tradition of concatenative languages, we will build our decoder from several smaller words.
For symmetry with the encode-value word, we need a way to unescape the ` characters:
: decode-value ( string -- string' ) R" \\`" "`" re-replace ;
Since the TXON format is a series of name:`value` pairs, we can parse the name by finding the separator and then decoding the name (which might contain escaped characters):
: parse-name ( string -- remain name ) ":`" split1 swap decode-value ;
To build a word that finds the first (unescaped) ` character, we will first make a word that looks at adjacent characters, returning true if the second character is an unescaped `:
: `? ( ch1 ch2 -- ? ) [ CHAR: \ = not ] [ CHAR: ` = ] bi* and ;
By grouping the string into adjacent characters, we can find the first unescaped ` (specially handling the case where the first character is an `):
: (find-`) ( string -- n/f ) 2 clump [ first2 `? ] find drop [ 1 + ] [ f ] if* ; : find-` ( string -- n/f ) dup ?first CHAR: ` = [ drop 0 ] [ (find-`) ] if ;
Parsing the value is slightly complicated by the fact that TXON supports values which might themselves be a single value, a sequence of values, or a series of name/value pairs. Basically, that means we need to:
- find the first
`character - checks if the previous character is a
:(indicating a name/value) - parse all name/values if so, otherwise decode the value(s)
That algorithm can be translated into this code:
DEFER: name/values : (parse-value) ( string -- values ) decode-value string-lines dup length 1 = [ first ] when ; : parse-value ( string -- remain value ) dup find-` [ dup 1 - pick ?nth CHAR: : = [ drop name/values ] [ cut swap (parse-value) ] if [ rest [ blank? ] trim-head ] dip ] [ f swap ] if* ;
We want to parse a "name=value" pair, which should be as easy as parsing the name, then the value, then associating into a hashtable:
: (name=value) ( string -- remain term ) parse-name [ parse-value ] dip associate ;
The string might contain a "name=value" pair, or just a single value:
: name=value ( string -- remain term ) [ blank? ] trim ":`" over subseq? [ (name=value) ] [ f swap ] if ;
We finish by building a word to produce all "name=value" pairs, used in the parse-value word earlier.
: name/values ( string -- remain terms ) [ dup { [ empty? not ] [ first CHAR: ` = not ] } 1&& ] [ name=value ] produce assoc-combine ;
Putting all of that together, we can make a word to parse a TXON string, producing "name=value" pairs until exhausted:
: parse-txon ( string -- objects ) [ dup empty? not ] [ name=value ] produce nip ; : txon> ( string -- object ) parse-txon dup length 1 = [ first ] when ;
Try It
You can try this out in the listener:
IN: scratchpad H{ { "a" "123" } } >txon . "a:`123`" IN: scratchpad "a:`123`" txon> . H{ { "a" "123" } }
Can you improve on this? Maybe by using the peg.ebnf vocabulary to create an EBNF parsing word?
The code for this (and a bunch of tests) are on my Github.
very similar to yaml, but I think yaml is better
ReplyDelete