The other day I needed a parser for INI-style configuration files. When I couldn't find a convenient Factor vocabulary to do this, I decided to write one.
A basic configuration file could look like this:
[owner] name=John Doe e-mail=john.doe@example.com [database] host=127.0.0.1 # change to production when ready port=1234 username=test password="a really long string"
These configurations are essentially groups of name/value pairs, and can be naturally expressed as an assoc. We will be implementing a simple API for reading and writing:
: read-ini ( -- assoc ) : write-ini ( assoc -- ) : string>ini ( str -- assoc ) : ini>string ( assoc -- str )
This implementation uses these vocabularies:
USING: arrays assocs combinators formatting hashtables io io.streams.string kernel make math sequences strings strings.parser ;
Some utility words are used to trim spaces from tokens, extract strings from section names (e.g., "[database]"), and remove comments from lines:
: unspace ( str -- str' )
    [ " \t\n\r" member? ] trim ;
: unwrap ( str -- str' )
    1 swap [ length 1 - ] keep subseq ;
: uncomment ( str -- str' )
    CHAR: # over index [ head ] when* ;There are a variety of parsing strategies we could use here. To keep things simple, we will be parsing the configuration file line-by-line. Also, we will make the assumption that each line contains either a "[section]" or a "name=value" (but not both).
We know a line is a section if it starts with '[' and ends with ']':
: section? ( line -- ? )
    [ first CHAR: [ = ] [ last CHAR: ] = ] bi and ;The current section is parsed and stored as a two-element array containing the name of the section and a vector of name/value pairs:
: [section] ( line -- section )
    unwrap unspace V{ } clone 2array ;Each name/value is parsed and added to the vector of name/value pairs in the current section:
: name=value ( section line -- section' )
    CHAR: = over index cut rest [ unspace ] bi@
    2array over second push ;We will be using the make words.  When we encounter a new section, or the end of the file, we will append the current section to the sequence of sections being built by make:
: section, ( section/f -- )
    [ first2 >hashtable 2array , ] when* ;
: parse-line ( section line -- section' )
    uncomment unspace [
        dup section?
        [ swap section, [section] ] [ name=value ] if
    ] unless-empty ;
: read-ini ( -- assoc )
    [
        f [ parse-line ] each-line section,
    ] { } make >hashtable ;Implementing write-ini is pretty easy.  It's just a matter of iterating over all values in the specified assoc, and printing them out with some minor structure:
: write-ini ( assoc -- )
    [
        [ "[%s]\n" printf ] dip
        [ "%s=%s\n" printf ] assoc-each
        nl
    ] assoc-each ;The string>ini and ini>string words are easy too.  Both the read-ini and write-ini words operate on input and output streams, so we can use string streams:
: string>ini ( str -- assoc )
    [ read-ini ] with-string-reader ;
: ini>string ( assoc -- str )
    [ write-ini ] with-string-writer ;This was a really simple implementation. In addition to the basics, I wanted to be able to support:
- Embedded escape characters (e.g., "\t", "\n", etc.).
- Line continuations (e.g., multi-line values).
- Java .properties files.
- Liberal parsing of minor formatting errors.
- Support both '#' and ';' comment characters.
- Quoted strings (e.g., name="value").
You can find all that and more (along with tests and some documentation) on my Github. I hope to contribute it to the main repository soon.
 
2 comments:
I've been puzzling over this for a while but don't understand why you used a vector sequence type below:
: [section] ( line -- section )
unwrap unspace V{ } clone 2array ;
... when in the 'section,' word definition, you unroll the section array and convert the vector into a hashtable.
Why not define [section] as:
: [section] ( line -- section )
unwrap unspace H{ } clone 2array ;
.. and save the conversion?
Good idea. I had originally thought to keep the properties in insertion order (maybe using an association list), but then changed to use hashing for random access.
I updated my code on Github to remove the intermediate vector.
Post a Comment