Sunday, March 20, 2011

Typed Netstrings

A few hours ago, Zed Shaw tweeted about an experiment creating typed netstrings:

An experiment in tagging netstrings with the types of their data as an alternative to JSON: than a minute ago via web

I thought that an implementation in Factor would contrast nicely with Zed's Python version. The basic idea is to support four kinds of data types:

  • Text
  • Numbers
  • Lists
  • Dictionaries (e.g., maps or associations)

First, some imports:

USING: arrays combinators formatting hashtables kernel
math.parser sequences splitting ;

An encoded payload value looks something like "{LENGTH}:{PAYLOAD}{TYPE}". We can write a simple word to parse a string that looks like that into its parts:

: parse-payload ( data -- remain payload payload-type )
    ":" split1 swap string>number cut unclip swapd ;

We can build a simple parser for these "typed netstrings" (deferring for the moment, how we parse lists and dictionaries):

DEFER: parse-dict
DEFER: parse-list

: parse-tnetstring ( data -- remain value )
    parse-payload {
        { CHAR: # [ string>number ] }
        { CHAR: " [ ] }
        { CHAR: } [ parse-dict ] }
        { CHAR: ] [ parse-list ] }
        [ "Invalid payload type: %c" sprintf throw ]
    } case ;

Parsing lists is just repeatedly parsing values until the remainder is exhausted:

: parse-list ( data -- value )
    [ { } ] [
        [ dup empty? not ] [ parse-tnetstring ] produce nip
    ] if-empty ;

Parsing dictionaries is only a little more involved. We need a way to parse successive key/value pairs, checking some simple error conditions:

: parse-pair ( data -- extra value key )
    parse-tnetstring [
        dup [ "Unbalanced dictionary store" throw ] unless
        dup [ "Invalid value, null not allowed" throw ] unless
    ] dip ;

Then we can build the dictionary, repeatedly parsing key/value pairs:

: parse-dict ( data -- value )
    [ H{ } ] [
        [ dup empty? not ] [ parse-pair swap 2array ] produce
        nip >hashtable
    ] if-empty ;

And, to make the interface easy to use, we wrap our parse-tnetstring word, checking that there was no un-parsed remainder:

: tnetstring ( data -- value )
    parse-tnetstring swap [
        "Had trailing junk: %s" sprintf throw
    ] unless-empty ;

We can show that it works on one of Zed's more complex examples:

( scratchpad ) "34:5:hello\"22:11:12345678901#4:this\"]}" tnetstring .
H{ { "hello" { 12345678901 "this" } } }

The code (and some tests) for this is on Github.

Update: I added support for booleans and implemented writer words to match the reader words above. Everything is on my Github.


Unknown said...

Hi John,

Very new to Factor and really enjoying your blog.

Reading through this code, shouldn't

dup [ "Unbalanced dictionary store" throw ] unless


dup empty? [ "Unbalanced dictionary store" throw ] when

in parse-pair? Also, it seems the null check which follows immediately afterwards would only be relevant in the version of the parser extended to support nulls.

mrjbq7 said...

Hi Justin!

I think you're right on both counts. On the first, the remain would be empty (not false), so the check should be "when-empty".