Saturday, July 27, 2013

METAR

METAR is a format for reporting weather information. Often used by pilots and meteorologists, METAR might be considered the most popular format in the world for sharing weather data. There are METAR parsers in Python (PyMETAR and python-metar), Ruby (metar-parser), Perl (metaf2xml and Geo::METAR), Java (jweather), and probably most other programming languages.

Today, we are going to build a METAR parser in Factor!

Note: If you are curious, a detailed METAR Coding Standards is available as part of the Federal Meteorological Handbook No. 1 - Surface Weather Observations and Reports.

Example

An example report for JFK Airport in New York looks like this:

KJFK 262351Z 19011KT 10SM -RA SCT060 BKN250 23/18 A3002 RMK AO2
SLP166 T02280183 10289 20228 51004

The "readable" output of our METAR parser translates that to:

IN: scratchpad "KJFK" metar.
Station       KJFK
Timestamp     Fri, 26 Jul 2013 23:51:00 GMT
Wind          from S (190°) at 11 knots
Visibility    10 statute miles
Weather       light rain
Sky condition scattered at 6000 ft, broken at 25000 ft
Temperature   23 °C
Dew point     18 °C
Altimeter     30.02 Hg
Remarks       AO2 SLP166 T02280183 10289 20228 51004

The remarks section includes additional weather observations and other information. For now, we will skip parsing the remarks and just begin with the standard fields contained in the body of the METAR report.

Each field will have a regular expression that can be used to recognize its presence in the report.

Station

KJFK 262351Z 19011KT 10SM -RA SCT060 BKN250 23/18 A3002 RMK AO2
SLP166 T02280183 10289 20228 51004

Each station is identified by a 4-letter station identifier:

CONSTANT: re-station R! \w{4}!

Given a station identifier, we can lookup its current METAR report directly from the NOAA (National Oceanic and Atmospheric Administration):

: http-weather ( path -- result )
    "http://weather.noaa.gov/" prepend http-get nip ;

: metar ( station -- metar )
    "pub/data/observations/metar/stations/%s.TXT"
    sprintf http-weather ;

Timestamp

KJFK 262351Z 19011KT 10SM -RA SCT060 BKN250 23/18 A3002 RMK AO2
SLP166 T02280183 10289 20228 51004

The timestamp is a date and time of the report containing two-digit fields for day, hour, and minute, and the letter "Z" indicating UTC:

CONSTANT: re-timestamp R! \d{2}\d{2}\d{2}Z!

We will assume the current year and month when parsing it:

: parse-timestamp ( str -- str' )
    [ now [ year>> ] [ month>> ] bi ] dip
    2 cut 2 cut 2 cut drop [ string>number ] tri@
    0 instant <timestamp> timestamp>rfc822 ;
Note: This is not quite robust on the first day of the month when reading METAR reports that were generated the day before...

Wind

KJFK 262351Z 19011KT 10SM -RA SCT060 BKN250 23/18 A3002 RMK AO2
SLP166 T02280183 10289 20228 51004

The first element of wind is the direction (specified as degrees, like on a compass) and speed (specified in knots). The wind speed can optionally include a higher speed if the location is experiencing wind gusts:

CONSTANT: re-wind R! (VRB|\d{3})\d{2,3}(G\d{2,3})?KT!

The second element is specified if the wind is variable, coming from a range of directions, it is specified as one three-digit compass direction followed by a "V" and another three-digit compass direction:

CONSTANT: re-wind-variable R! \d{3}V\d{3}!

To help our users, we will first build a word to convert a compass direction to the nearest "human" direction (e.g., north, east, south, west):

CONSTANT: compass-directions H{
    { 0.0 "N" }
    { 45.0 "NE" }
    { 90.0 "E" }
    { 135.0 "SE" }
    { 180.0 "S" }
    { 225.0 "SW" }
    { 270.0 "W" }
    { 315.0 "NW" }
    { 360.0 "N" }
}

: direction>compass ( direction -- compass )
    45.0 round-to-step compass-directions at ;

Either the direction will be variable (VRB), or from a particular human and compass direction:

: parse-direction ( str -- str' )
    dup "VRB" = [ drop "variable" ] [
        string>number [ direction>compass ] keep
        "from %s (%s°)" sprintf
    ] if ;

That's everything we need to parse the basic wind direction and speed:

: parse-wind ( str -- str' )
    dup "00000KT" = [ drop "calm" ] [
        3 cut "KT" ?tail drop "G" split1
        [ parse-direction ] [ string>number ] [ string>number ] tri*
        [ "%s at %s knots with gusts to %s knots" sprintf ]
        [ "%s at %s knots" sprintf ] if*
    ] if ;

And, if it is provided, we can parse the variable wind direction also:

: parse-wind-variable ( str -- str' )
    "V" split1 [ string>number [ direction>compass ] keep ] bi@
    ", variable from %s (%s°) to %s (%s°)" sprintf ;

Visibility

KJFK 262351Z 19011KT 10SM -RA SCT060 BKN250 23/18 A3002 RMK AO2
SLP166 T02280183 10289 20228 51004

The visibility is specified as a number (usually whole, but sometimes a fraction) of statute miles:

CONSTANT: re-visibility R! [M]?\d+(/\d+)?SM!

In very poor conditions, the report might specify "M1/4SM" which means "less than 1/4 statute miles", so we want to handle that. We also want to parse "11/4" as "1+1/4":

: parse-visibility ( str -- str' )
    "M" ?head "less than " "" ? swap "SM" ?tail drop
    CHAR: / over index [ 1 > [ 1 cut "+" glue ] when ] when*
    string>number "%s%s statute miles" sprintf ;

Weather

KJFK 262351Z 19011KT 10SM -RA SCT060 BKN250 23/18 A3002 RMK AO2
SLP166 T02280183 10289 20228 51004

The present weather is specified by a severity modifier, an indicator if it is over the airport or in the general vicinity, and one or two weather descriptors:

CONSTANT: re-weather R! [+-]?(VC)?(\w{2}|\w{4})!

For use in our parser, we define all the two-letter codes used to represent weather:

CONSTANT: weather H{
    { "BC" "patches" }
    { "BL" "blowing" }
    { "BR" "mist" }
    { "DR" "low drifting" }
    { "DS" "duststorm" }
    { "DU" "widespread dust" }
    { "DZ" "drizzle" }
    { "FC" "funnel clouds" }
    { "FG" "fog" }
    { "FU" "smoke" }
    { "FZ" "freezing" }
    { "GR" "hail" }
    { "GS" "small hail and/or snow pellets" }
    { "HZ" "haze" }
    { "IC" "ice crystals" }
    { "MI" "shallow" }
    { "PL" "ice pellets" }
    { "PO" "well-developed dust/sand whirls" }
    { "PR" "partial" }
    { "PY" "spray" }
    { "RA" "rain" }
    { "RE" "recent" }
    { "SA" "sand" }
    { "SG" "snow grains" }
    { "SH" "showers" }
    { "SN" "snow" }
    { "SQ" "squalls" }
    { "SS" "sandstorm" }
    { "TS" "thuderstorm" }
    { "UP" "unknown" }
    { "VA" "volcanic ash" }
}

Weather severity defaults to moderate, but a leading "+" is used to indicate heavy weather and a "-" for light weather. Also, "+FC" is a special indicator of tornadoes and waterspouts:

: (parse-weather) ( str -- str' )
    dup "+FC" = [ drop "tornadoes or waterspouts" ] [
        dup first {
            { CHAR: + [ rest "heavy " ] }
            { CHAR: - [ rest "light " ] }
            [ drop f ]
        } case [
            2 group [ weather at ] map " " join
        ] dip prepend
    ] if ;

If "VC" is specified, it means the weather is in the vicinity (instead of overhead). We check to see if it was specified, and add the phrase if it was:

: parse-weather ( str -- str' )
    "VC" over subseq? [ "VC" "" replace t ] [ f ] if
    [ (parse-weather) ]
    [ [ " in the vicinity" append ] when ] bi* ;

Sky Conditions

KJFK 262351Z 19011KT 10SM -RA SCT060 BKN250 23/18 A3002 RMK AO2
SLP166 T02280183 10289 20228 51004

Several sky conditions might be present in the message, each having a three-character indicator of cloud cover and an altitude, optionally including the type of clouds observed:

CONSTANT: re-sky-condition R! (\w{3}\d{3}(\w+)?|\w{3}|CAVOK)!

We should have a hashtable with mappings from the abbreviations to plain text:

CONSTANT: sky H{
    { "BKN" "broken" }
    { "FEW" "few" }
    { "OVC" "overcast" }
    { "SCT" "scattered" }
    { "SKC" "clear sky" }
    { "CLR" "clear sky" }
    { "NSC" "clear sky" }

    { "ACC" "altocumulus castellanus" }
    { "ACSL" "standing lenticular altocumulus" }
    { "CCSL" "cirrocumulus standing lenticular cloud" }
    { "CU" "cumulus" }
    { "SC" "stratocumulus" }
    { "SCSL" "stratocumulus standing lenticular cloud" }
    { "TCU" "towering cumulus" }
}

The altitudes present are in hundreds of feet above the ground. Also, we check for "CAVOK", which is sometimes used to mean "Ceiling and Visibility are OK":

: parse-sky-condition ( str -- str' )
    dup "CAVOK" = [
        drop "clear skies and unlimited visibility"
    ] [
        3 cut 3 cut
        [ sky at ]
        [ string>number " at %s00 ft" sprintf ]
        [ sky at [ " (%s)" sprintf ] [ f ] if* ]
        tri* 3append
    ] if ;

Temperature and Dew Point

KJFK 262351Z 19011KT 10SM -RA SCT060 BKN250 23/18 A3002 RMK AO2
SLP166 T02280183 10289 20228 51004

The temperature is specified as a number of degrees Celsius (preceded by an "M" to indicate a negative number), and optionally a dew point if available:

CONSTANT: re-temperature R! [M]?\d{2}/([M]?\d{2})?!

The parsing code is fairly easy in this case:

: parse-temperature ( str -- temperature dew-point )
    "/" split1 [
        [ f ] [
            "M" ?head [ string>number ] [ [ neg ] when ] bi*
            "%s °C" sprintf
        ] if-empty
    ] bi@ ;

Altimeter

KJFK 262351Z 19011KT 10SM -RA SCT060 BKN250 23/18 A3002 RMK AO2
SLP166 T02280183 10289 20228 51004

The altimeter information is a four-digit numerical observation of the current air pressure at the surface, measured in inches of mercury or hectopascals:

CONSTANT: re-altimeter R! [AQ]\d{4}!

We can parse it, checking for A (USA) or Q (International) units:

: parse-altimeter ( str -- str' )
    unclip [ string>number ] [ CHAR: A = ] bi*
    [ 100 /f "%.2f Hg" sprintf ] [ "%s hPa" sprintf ] if ;

Remarks

KJFK 262351Z 19011KT 10SM -RA SCT060 BKN250 23/18 A3002 RMK AO2
SLP166 T02280183 10289 20228 51004

For the remarks, since this post is getting to be rather long, let's just save them raw and deal with parsing them later... however, as a preview of that, here is a short interpretation of this message:

  • AO2 - station with precipitation discriminator
  • SLP166 - sea-level pressure is 1016.6 hPa
  • T02280183 - 1-hr temperature 22.8 °C and dew point 18.3 °C
  • 10289 - 6-hr maximum temperature 28.9 °C
  • 20228 - 6-hr minimum temperature 22.8 °C
  • 51004 - atmospheric pressure increasing by 0.4 hPa

Finally!

We will define a report having these fields:

TUPLE: report station timestamp wind visibility weather
sky-condition temperature dew-point altimeter remarks ;

Okay, just a little bit more and we can finally get to parsing and displaying the message. We need a way to find a single token if present, removing it from the list, and returning the remainder of the message:

: find-one ( seq quot: ( elt -- ? ) -- seq elt/f )
    dupd find drop [ tail unclip ] [ f ] if* ; inline

Some tokens are present multiple times (such as weather and sky condition), so we want to return a sequence of all tokens found sequentially matching the requested type:

: find-all ( seq quot: ( elt -- ? ) -- seq elts )
    [ find-one swap ] keep '[
        dup [ f ] [ first @ ] if-empty
    ] [ unclip ] produce rot [ prefix ] when* ; inline

Parsing a report is a rather lengthy, but it basically splits the report into the body and remarks, then looks for each token in the order that it should be found in the message, setting it on the report object if present:

: <report> ( str -- report )
    [ report new ] dip [ blank? ] split-when
    { "RMK" } split1 [ body ] [ remarks ] bi* ;

: body ( report seq -- report )
    [ re-station matches? ] find-one
    [ pick station<< ] when*

    [ re-timestamp matches? ] find-one
    [ parse-timestamp pick timestamp<< ] when*

    [ re-wind matches? ] find-one
    [ parse-wind pick wind<< ] when*

    [ re-wind-variable matches? ] find-one
    [ parse-wind-variable pick wind>> prepend pick wind<< ] when*

    [ re-visibility matches? ] find-one
    [ parse-visibility pick visibility<< ] when*

    [ re-weather matches? ] find-all
    [ parse-weather ] map ", " join pick weather<<

    [ re-sky-condition matches? ] find-all
    [ parse-sky-condition ] map ", " join pick sky-condition<<

    [ re-temperature matches? ] find-one
    [
        parse-temperature
        [ pick temperature<< ]
        [ pick dew-point<< ] bi*
    ] when*

    [ re-altimeter matches? ] find-one
    [ parse-altimeter pick altimeter<< ] when*

    drop ;

: remarks ( report seq -- report )
    " " join >>remarks ;

And now, display our output in a table, wrapping the right column at 65 characters:

: row. ( name quot -- )
    '[
        [ _ write ] with-cell
        [ @ [ 65 wrap-string write ] when* ] with-cell
    ] with-row ; inline

: report. ( report -- )
    standard-table-style [
        {
            [ "Station" [ station>> ] row. ]
            [ "Timestamp" [ timestamp>> ] row. ]
            [ "Wind" [ wind>> ] row. ]
            [ "Visibility" [ visibility>> ] row. ]
            [ "Weather" [ weather>> ] row. ]
            [ "Sky condition" [ sky-condition>> ] row. ]
            [ "Temperature" [ temperature>> ] row. ]
            [ "Dew point" [ dew-point>> ] row. ]
            [ "Altimeter" [ altimeter>> ] row. ]
            [ "Remarks" [ remarks>> ] row. ]
        } cleave
    ] tabular-output nl ;

Combining this with our original metar word to download a current report and print it out, printing a nice message if the station requested does not exist or have a weather report:

: metar. ( station -- )
    [ metar <report> report. ]
    [ drop "%s METAR not found\n" printf ] recover ;

Phew, that plus a lot more parsing of the remarks section is available on my GitHub.

Monday, July 22, 2013

Logistic Map

A recent blog post discusses the relative performance between Haskell and C when calculating the billionth iteration of a particular logistic map function:

f(x) = 3.57 * x * (1-x)

A bit curious about how Factor would perform, I decided to do a comparison.

C

A simple implementation in C...

#include <stdio.h>

int main() {
    double x = 0.5;
    for(int i = 0; i < 1000000000; ++i){
        x = 3.57 * x * (1-x);
    }
    printf("x=%f\n", x);
}

...yields an answer in about 4.5 seconds on my machine:

$ time ./answer
x=0.495618

real 0m4.501s
user 0m4.495s
sys 0m0.005s

Factor

A simple implementation in Factor...

: logistic-chaos ( x -- y )
    [ 3.57 * ] [ 1 swap - * ] bi ; inline

: answer ( -- n )
    0.5 1,000,000,000 [ logistic-chaos ] times ;

...yields an answer in about 4.5 seconds on my machine!

IN: scratchpad [ answer ] time .
Running time: 4.478111574 seconds

0.4956180832934247
Note: Part of the speed comes from the Factor compiler using the inline request to determine that this is always called with floating point numbers. If the word was not inlined, the overhead of generic dispatch on every call would make this calculation take 19 seconds. If the calculation was not compiled into a word (as it is above), and just typed into the listener and executed with the non-optimizing compiler, the calculation would take 57 seconds.

Monday, July 15, 2013

Reading List

The Safari browser has a feature called "Reading List" that makes it easy to synchronize and bookmark pages for later reading. Recently, a Mac OS X Hint linked to a question on stackexchange.com asking how to get Reading List items as links.

The current answer is a Python script that:

  1. Finds the safari bookmarks plist file
  2. Copies the plist file to a temporary file
  3. Converts the plist file from binary to text format
  4. Parses it into a python object
  5. Finds the "Reading List" entry
  6. Prints out the saved URLs

I thought it would be fun to contrast that with a simple solution in Factor.

: reading-list ( -- urls )
    "~/Library/Safari/Bookmarks.plist" read-plist
    "Children" of [ "Title" of "com.apple.ReadingList" = ] find nip
    "Children" of [ "URLString" of ] map ;

You can try it out in the listener to get something like this:

IN: scratchpad reading-list .
V{
    "http://factorcode.org"
    "http://news.ycombinator.com"
    "http://reddit.com"
}