The other day I needed a parser for INI-style configuration files. When I couldn't find a convenient Factor vocabulary to do this, I decided to write one.
A basic configuration file could look like this:
[owner] name=John Doe e-mail=john.doe@example.com [database] host=127.0.0.1 # change to production when ready port=1234 username=test password="a really long string"
These configurations are essentially groups of name/value pairs, and can be naturally expressed as an assoc. We will be implementing a simple API for reading and writing:
: read-ini ( -- assoc ) : write-ini ( assoc -- ) : string>ini ( str -- assoc ) : ini>string ( assoc -- str )
This implementation uses these vocabularies:
USING: arrays assocs combinators formatting hashtables io io.streams.string kernel make math sequences strings strings.parser ;
Some utility words are used to trim spaces from tokens, extract strings from section names (e.g., "[database]"), and remove comments from lines:
: unspace ( str -- str' ) [ " \t\n\r" member? ] trim ; : unwrap ( str -- str' ) 1 swap [ length 1 - ] keep subseq ; : uncomment ( str -- str' ) CHAR: # over index [ head ] when* ;
There are a variety of parsing strategies we could use here. To keep things simple, we will be parsing the configuration file line-by-line. Also, we will make the assumption that each line contains either a "[section]" or a "name=value" (but not both).
We know a line is a section if it starts with '[' and ends with ']':
: section? ( line -- ? ) [ first CHAR: [ = ] [ last CHAR: ] = ] bi and ;
The current section is parsed and stored as a two-element array containing the name of the section and a vector of name/value pairs:
: [section] ( line -- section ) unwrap unspace V{ } clone 2array ;
Each name/value is parsed and added to the vector of name/value pairs in the current section:
: name=value ( section line -- section' ) CHAR: = over index cut rest [ unspace ] bi@ 2array over second push ;
We will be using the make words. When we encounter a new section, or the end of the file, we will append the current section to the sequence of sections being built by make
:
: section, ( section/f -- ) [ first2 >hashtable 2array , ] when* ; : parse-line ( section line -- section' ) uncomment unspace [ dup section? [ swap section, [section] ] [ name=value ] if ] unless-empty ; : read-ini ( -- assoc ) [ f [ parse-line ] each-line section, ] { } make >hashtable ;
Implementing write-ini
is pretty easy. It's just a matter of iterating over all values in the specified assoc
, and printing them out with some minor structure:
: write-ini ( assoc -- ) [ [ "[%s]\n" printf ] dip [ "%s=%s\n" printf ] assoc-each nl ] assoc-each ;
The string>ini
and ini>string
words are easy too. Both the read-ini
and write-ini
words operate on input and output streams
, so we can use string streams:
: string>ini ( str -- assoc ) [ read-ini ] with-string-reader ; : ini>string ( assoc -- str ) [ write-ini ] with-string-writer ;
This was a really simple implementation. In addition to the basics, I wanted to be able to support:
- Embedded escape characters (e.g., "\t", "\n", etc.).
- Line continuations (e.g., multi-line values).
- Java .properties files.
- Liberal parsing of minor formatting errors.
- Support both '#' and ';' comment characters.
- Quoted strings (e.g., name="value").
You can find all that and more (along with tests and some documentation) on my Github. I hope to contribute it to the main repository soon.
2 comments:
I've been puzzling over this for a while but don't understand why you used a vector sequence type below:
: [section] ( line -- section )
unwrap unspace V{ } clone 2array ;
... when in the 'section,' word definition, you unroll the section array and convert the vector into a hashtable.
Why not define [section] as:
: [section] ( line -- section )
unwrap unspace H{ } clone 2array ;
.. and save the conversion?
Good idea. I had originally thought to keep the properties in insertion order (maybe using an association list), but then changed to use hashing for random access.
I updated my code on Github to remove the intermediate vector.
Post a Comment