I have wanted to parse timezone information files (also known as "tzfile") for awhile. In particular, so that Factor can begin to support named timezones in a smarter way.
Parsing
The tzfile is a binary format file from the tz database (also known as the "zoneinfo database"). Each tzfile starts with the four magic bytes "TZif
", which we can check:
ERROR: bad-magic ; : check-magic ( -- ) 4 read "TZif" sequence= [ bad-magic ] unless ;
The tzfile then contains a header followed by a series of ttinfo
structures and other information:
STRUCT: tzhead { tzh_reserved char[16] } { tzh_ttisgmtcnt be32 } { tzh_ttisstdcnt be32 } { tzh_leapcnt be32 } { tzh_timecnt be32 } { tzh_typecnt be32 } { tzh_charcnt be32 } ; PACKED-STRUCT: ttinfo { tt_gmtoff be32 } { tt_isdst uchar } { tt_abbrind uchar } ;
We can store all the information parsed from the tzfile in a tuple:
TUPLE: tzfile header transition-times local-times types abbrevs leaps is-std is-gmt ; C: <tzfile> tzfile
With a helper word to read 32-bit big-endian numbers, we can parse the entire file:
: read-be32 ( -- n ) 4 read be32 deref ; : read-tzfile ( -- tzfile ) check-magic tzhead read-struct dup { [ tzh_timecnt>> [ read-be32 ] replicate ] [ tzh_timecnt>> [ read1 ] replicate ] [ tzh_typecnt>> [ ttinfo read-struct ] replicate ] [ tzh_charcnt>> read ] [ tzh_leapcnt>> [ read-be32 read-be32 2array ] replicate ] [ tzh_ttisstdcnt>> read ] [ tzh_ttisgmtcnt>> read ] } cleave <tzfile> ;
All of that data specifies a series of local time types and transition times:
TUPLE: local-time gmt-offset dst? abbrev std? gmt? ; C: <local-time> local-time TUPLE: transition seconds timestamp local-time ; C: <transition> transition
The abbreviated local time names are stored in a flattened array. It would be helpful to parse them out into a hashtable where the key is the starting character index in the flattened array:
:: tznames ( abbrevs -- assoc ) 0 [ 0 over abbrevs index-from dup ] [ [ dupd abbrevs subseq >string 2array ] keep 1 + swap ] produce 2nip >hashtable ;
We can now construct an array of all the transition times and the local time types they represent. This is a lot of logic for a typical Factor word, so we use local variables to make it easier to understand:
:: tzfile>transitions ( tzfile -- transitions ) tzfile abbrevs>> tznames :> abbrevs tzfile is-std>> :> is-std tzfile is-gmt>> :> is-gmt tzfile types>> [ [ { [ tt_gmtoff>> seconds ] [ tt_isdst>> 1 = ] [ tt_abbrind>> abbrevs at ] } cleave ] dip [ is-std ?nth dup [ 1 = ] when ] [ is-gmt ?nth dup [ 1 = ] when ] bi <local-time> ] map-index :> local-times tzfile transition-times>> tzfile local-times>> [ [ dup unix-time>timestamp ] [ local-times nth ] bi* <transition> ] 2map ;
We want to wrap the tzfile
parsed structure and the transitions in a tzinfo
object that can be used later with timestamps. These tzinfo
objects are created by parsing from specific files by path or by their zoneinfo name:
TUPLE: tzinfo tzfile transitions ; C: <tzinfo> tzinfo : file>tzinfo ( path -- tzinfo ) binary [ read-tzfile dup tzfile>transitions <tzinfo> ] with-file-reader ; : load-tzinfo ( name -- tzinfo ) "/usr/share/zoneinfo/" prepend file>tzinfo ;
Timestamps
Now that we have the tzinfo
, we can convert a UTC timestamp into the timezone specified by our tzfile. This is accomplished by finding the transition time that affects the requested timestamp and adjusting by the GMT offset that it represents:
: find-transition ( timestamp tzinfo -- transition ) [ timestamp>unix-time ] [ transitions>> ] bi* [ [ seconds>> before? ] with find drop ] [ swap [ 1 [-] swap nth ] [ last ] if* ] bi ; : from-utc ( timestamp tzinfo -- timestamp' ) [ drop instant >>gmt-offset ] [ find-transition local-time>> gmt-offset>> ] 2bi convert-timezone ;
Or normalize a timestamp that might be in a different timezone into the timezone specified by our tzfile (converting into and then out of UTC):
: normalize ( timestamp tzinfo -- timestamp' ) [ instant convert-timezone ] [ from-utc ] bi* ;
Example
An example of it working, taking a date in PST that is after a daylight savings transition, printing it out then subtracting 10 minutes and normalizing to the "US/Pacific" zoneinfo file, printing it out showing the time in PDT:
IN: scratchpad ! Take a time in PST 2002 10 27 1 0 0 -8 hours <timestamp> ! Print it out dup "%c" strftime . "Sun Oct 27 01:00:00 2002" IN: scratchpad ! Subtract 10 minutes 10 minutes time- ! Normalize to US-Pacific timezone "US/Pacific" load-tzinfo normalize ! Print it out "%c" strftime . "Sun Oct 27 01:50:00 2002"
The code for this is available in the development version of Factor.