Saturday, April 5, 2014

Speedtest

Many people are familiar with Speedtest.net, which is used to test a network connection, displaying download speeds, upload speeds, and server latency. Implemented as a Flash-based interface, it can be used from a web browser to verify your internet provider is giving you what you pay for.

You might not be aware that the speedtest-cli project provides a way to check internet speed from the command line in a similar manner.

I thought it might be fun to implement an interface to Speedtest.net using Factor:

Closest Servers

Speedtest provides a list of available servers all over the world that can be used for testing, returned as XML. After parsing the XML document, we use a utility method to extract attributes for each server into an array:

: attr-map ( tag -- attrs )
    attrs>> [ [ main>> ] dip ] H{ } assoc-map-as ;

: speedtest-servers ( -- servers )
    "http://www.speedtest.net/speedtest-servers.php"
    http-get nip string>xml
    "server" deep-tags-named [ attr-map ] map ;

Calculating the geographical distance between two points, specified by latitude and longitude:

: radians ( degrees -- radians ) pi * 180 /f ; inline

:: geo-distance ( lat1 lon1 lat2 lon2 -- distance )
    6371 :> radius ! km
    lat2 lat1 - radians :> dlat
    lon2 lon1 - radians :> dlon
    dlat 2 / sin sq dlon 2 / sin sq
    lat1 radians cos lat2 radians cos * * + :> a
    a sqrt 1 a - sqrt fatan2 2 * :> c
    radius c * ;

This lets us find the closest server to a given geographic location:

: lat/lon ( assoc -- lat lon )
    [ "lat" of ] [ "lon" of ] bi [ string>number ] bi@ ;

: server-distance ( server lat lon -- server )
    '[ lat/lon _ _ geo-distance "distance" ] keep
    [ set-at ] keep ;

: closest-servers-to ( lat lon -- servers )
    [ speedtest-servers ] 2dip '[ _ _ server-distance ] map
    [ "distance" of ] sort-with ;

The available Speedtest configuration provides our latitude and longitude, allowing us to sort the server list by geographic distance:

TUPLE: config client times download upload ;

C: <config> config

: speedtest-config ( -- config )
    "http://www.speedtest.net/speedtest-config.php"
    http-get nip string>xml {
        [ "client" deep-tag-named attr-map ]
        [ "times" deep-tag-named attr-map ]
        [ "download" deep-tag-named attr-map ]
        [ "upload" deep-tag-named attr-map ]
    } cleave <config> ;

: closest-servers ( -- servers )
    speedtest-config client>> lat/lon closest-servers-to ;

Best Server

We can calculate latency by downloading a small latency.txt file and timing how long it takes:

: (server-latency) ( server -- ms )
    "url" of >url URL" latency.txt" derive-url
    [ http-get nip "test=test\n" = ] benchmark 1,000,000 /f
    3,600,000 ? ;

After calculating latency, we save it for later use:

: server-latency ( server -- server )
    [ (server-latency) "latency" ] keep [ set-at ] keep ;

The "best" server that we will use for testing is the one with the lowest latency, checking the five closest servers to our location:

: best-server ( -- server )
    closest-servers 5 short head
    [ server-latency ] parallel-map
    [ "latency" of ] sort-with first ;

Upload Speed

To calculate upload speed, we upload several document sizes (filling the content with zeroes) and time how long it takes:

: upload-data ( size -- data )
    9 - CHAR: 0 <string> "content1=" prepend ;

: (upload-speed) ( server -- Mbps )
    "url" of >url { 250,000 500,000 } [
        [
            upload-data [ swap http-put 2drop ] keep length
        ] with map-sum
    ] benchmark 1,000,000,000 /f / 8 * 1,000,000 / ;

After calculating upload speed, we save it for later use:

: upload-speed ( server -- server )
    [ (upload-speed) "upload" ] keep [ set-at ] keep ;

Download Speed

To calculate download speed, we download several files with varying sizes in parallel and time how long it takes:

: download-urls ( server -- urls )
    "url" { 350 500 750 1000 } 
    [ dup "random%sx%s.jpg" sprintf >url derive-url ] with map ;

: (download-speed) ( server -- Mbps )
    download-urls 4 swap <array> [
        [ [ http-get nip length ] map-sum ] parallel-map sum
    ] benchmark 1,000,000,000 /f / 8 * 1,000,000 / ;

After calculating download speed, we save it for later use:

: download-speed ( server -- server )
    [ (download-speed) "download" ] keep [ set-at ] keep ;

Text Results

With all of that built, we can build a word to run a Speedtest, printing out the results as text:

: run-speedtest ( -- server )
    "Selecting best server based on ping..." print flush
    best-server dup {
        [ "sponsor" of ]
        [ "name" of ]
        [ "distance" of ]
        [ "latency" of ]
    } cleave "Hosted by %s (%s) [%0.2f km]: %s ms\n" printf
    "Testing download speed" print flush download-speed
    dup "download" of "Download: %0.2f Mbit/s\n" printf
    "Testing upload speed" print flush upload-speed
    dup "upload" of "Upload: %0.2f Mbit/s\n" printf ;

Graphic Results

It would be nice if we could show the reports graphically, and as it turns out, its not too hard. We just have to upload the results to speedtest.net in the same way their Flash application does, and then display the image that is created for you.

: make-result ( server -- result )
    [
        {
            [ "download" of 1,000 * >integer "download" ,, ]
            [ "latency" of >integer "ping" ,, ]
            [ "upload" of 1,000 * >integer "upload" ,, ]
            [ drop "" "promo" ,, ]
            [ drop "pingselect" "startmode" ,, ]
            [ "id" of "recommendedserverid" ,, ]
            [ drop "1" "accuracy" ,, ]
            [ "id" of "serverid" ,, ]
            [
                [ "latency" of ]
                [ "upload" of 1,000 * ]
                [ "download" of 1,000 * ] tri
                "%d-%d-%d-297aae72" sprintf md5 checksum-bytes
                hex-string "hash" ,,
            ]
        } cleave
    ] { } make ;

: submit-result ( server -- result-id )
    make-result "http://www.speedtest.net/api/api.php"
    <post-request> [
        [
            "http://c.speedtest.net/flash/speedtest.swf"
            "referer"
        ] dip header>> set-at
    ] keep http-request nip query>assoc "resultid" of ;

Speedtest

Putting this all together, we can run the Speedtest, submit the results, then display the test results as an image.

: speedtest ( -- )
    run-speedtest submit-result "Share results: " write
    "http://www.speedtest.net/result/%s.png" sprintf
    [ dup >url write-object nl ] [ http-image. ] bi ;

Some things that I would like to improve:

  • The Speedtest configuration actually specifies the details of download and upload sizes, the amount of parallelism, and the duration of the test, we should use it.
  • The http-get word needs an overall timeout so we can scale between very slow and very fast connection speeds.
  • The Speedtest graphical result images are "retina" when viewed in the web browser, but are not when downloaded from Factor or wget.
  • Factor needs an easier way to create a queue of work that is processed by several worker threads, for convenience I just used one of the concurrent combinators.

The code for this is on my GitHub.

Sunday, February 9, 2014

inet_ntoa and inet_aton

I was reading an article about micro optimizing int to IP address conversions. The author was trying to convert a 32-bit integer representation of an IP address into the more typical string representation using Common Lisp.

This is basically what the standard C library functions inet_ntoa and inet_aton do. I thought it might be fun to implement this in Factor and compare performance with the C versions.

alien

First, lets use the alien FFI vocabulary to allow the C functions to be called from Factor:

FUNCTION: c-string inet_ntoa ( uint32_t addr ) ;

FUNCTION: int inet_aton ( c-string s, uint32_t *addr ) ;

We can call inet_ntoa directly, but to call inet_aton, we need a simple wrapper that calls it, preserves the result, and checks for success or failure:

: inet-aton ( x -- y )
    { uint32_t } [ inet_aton 1 assert= ] with-out-parameters ;

We can test to see that it works:

IN: scratchpad 81952074 inet_ntoa .
"74.125.226.4"

IN: scratchpad "74.125.226.4" inet-aton .
81952074

Parsing 1 million integers with inet_ntoa takes 1.016 seconds.

Parsing 1 million IP addresses with inet_aton takes 0.626 seconds.

simple

Okay, what if we want to implement these ourselves?

Note: Unlike inet_ntoa (which is in network byte order), we will assume little endian like the original author that spawned this adventure.

Converting an integer into an IP address string by taking each octet of the 32-bit number (based on the pseudocode used in the original article):

: ipv4-ntoa ( integer -- ip )
    { 0x1000000 0x10000 0x100 0x1 }
    [ /i 8 bits number>string ] with map "." join ;

Converting an IP address into an integer is as easy as splitting on the dots and performing the reverse operation:

: ipv4-aton ( ip -- integer )
    "." split [ string>number ] map
    { 0x1000000 0x10000 0x100 0x1 } v. ;

We can test that it works:

IN: scratchpad 1249763844 ipv4-ntoa .
"74.125.226.4"

IN: scratchpad "74.125.226.4" ipv4-aton .
1249763844

Parsing 1 million integers with ipv4-ntoa takes 0.653 seconds.

Parsing 1 million IP addresses with ipv4-aton takes 0.738 seconds.

faster

In the spirit of the original article, we will try some micro-optimizations (with some corresponding loss in readability) including type annotations.

Our versions use the generalized number parsing words, string>number and number>string. Specialized (less general) versions can give us additional performance:

TYPED: byte>string ( byte: fixnum -- str )
    $[ 256 iota [ number>string ] map ] nth ;

TYPED: string>byte ( str: string -- byte )
    0 [ [ 10 * ] dip CHAR: 0 - + ] reduce ;

We make a few other changes to use shifting and a slightly different approach:

TYPED: ipv4-ntoa2 ( integer: fixnum -- ip )
    { -24 -16 -8 0 } [ 8 shift-mod byte>string ] with map
    "." join ;

TYPED: ipv4-aton2 ( ip: string -- integer )
    "." split { 24 16 8 0 }
    [ [ string>byte ] dip shift ] [ + ] 2map-reduce ;

Parsing 1 million integers with ipv4-ntoa2 takes 0.436 seconds!

Parsing 1 million IP addresses with ipv4-aton2 takes 0.496 seconds!

fastest

If we really want to do more micro-optimizations, and produce some ugly but fast code, then we can change the code to ensure more fixnum operations:

TYPED: byte>string2 ( byte: fixnum -- str )
    $[ 256 iota [ number>string ] map ] nth-unsafe ;

TYPED: string>byte2 ( str: string -- byte )
    [ length iota 0 ] keep [
        string-nth-fast
        [ 10 fixnum*fast ] dip CHAR: 0 fixnum-fast fixnum+fast
    ] curry reduce ;

TYPED: ipv4-ntoa3 ( integer: fixnum -- ip )
    $[ { -24 -16 -8 0 } [ [ 8 shift-mod ] curry ] map ] cleave
    [ byte>string2 ] 4 napply 4array "." join ;

TYPED: ipv4-aton3 ( ip: string -- integer )
    "." split first4 [ string>byte2 ] 4 napply
    [ 24 16 8 [ fixnum-shift-fast ] tri-curry@ tri* ] dip
    [ fixnum+fast ] tri@ ;

Parsing 1 million integers with ipv4-ntoa3 takes 0.285 seconds!

Parsing 1 million IP addresses with ipv4-aton3 takes 0.355 seconds!

I have committed more complete versions of ipv4-ntoa and ipv4-aton (with support for parsing IPv4 addresses) to the development version of Factor.

Wednesday, January 22, 2014

Caesar Cipher

A Caesar cipher is a very simple encryption technique where each letter is shifted a fixed number of characters in the alphabet. It is named after Julius Caesar, who apparently used this technique in some of his letters.

For example, if we were to encode "FACTOR" by shifting each character to the right by three letters, we would get "IDFWRU". The "F" shifts to "I", the "A" shifts to "D", the "C" shifts to "F", etc.

Let's implement this in Factor!

First, we implement a word to shift a character (uppercase by convention) a specified number of letters. Using "A" as our "zero" point by subtracting, shifting modulo 26 character ascii alphabet, then re-adding the ascii value for "A" (65):

: caesar-shift ( ch n -- ch' )
    [ CHAR: A - ] dip + 26 rem CHAR: A + ;

Next, a word for shifting every letter in a string (preserving numbers and punctuation):

: caesar-map ( str n -- str' )
    '[ dup CHAR: A CHAR: Z between? [ _ caesar-shift ] when ] map ;

This lets us implement the encrypt and decrypt words. Encrypting is simple and decrypting is mapping with a negative shift number:

: caesar-encrypt ( plain n -- encrypted ) caesar-map ;

: caesar-decrypt ( encrypted n -- plain ) neg caesar-map ;

Trying it out:

IN: scratchpad "HELLO, WORLD!" 3 caesar-encrypt .
"KHOOR, ZRUOG!"

IN: scratchpad "KHOOR, ZRUOG!" 3 caesar-decrypt .
"HELLO, WORLD!"

Friday, December 13, 2013

Monte Carlo

The Monte Carlo method is a method of estimation that uses random simulation to solve physical or mathematical problems. Named after the Monte Carlo Casino, you can think of the method as playing a game of chance many times and recording the outcomes (such as how frequently one wins or loses).

One classic example from the Wikipedia article is estimating the value of π (although there are many other ways to approximate the value of π):

For example, consider a circle inscribed in a unit square. Given that the circle and the square have a ratio of areas that is π/4, the value of π can be approximated using a Monte Carlo method:

  1. Draw a square on the ground, then inscribe a circle within it.
  2. Uniformly scatter some objects of uniform size (grains of rice or sand) over the square.
  3. Count the number of objects inside the circle and the total number of objects.
  4. The ratio of the two counts is an estimate of the ratio of the two areas, which is π/4. Multiply the result by 4 to estimate π.

You can visualize how this works by seeing the estimate get more correct as you increase the number of points:

We can generate a random point inside the unit circle by generating two random-unit values:

: random-point ( -- x y )
    random-unit random-unit ;

Using the Pythagorean theorem, we can calculate the distance from the zero point. If the distance is less than or equal to 1.0, then it is inside the circle:

: inside-circle? ( x y -- ? )
    [ sq ] bi@ + sqrt 1.0 <= ;

We can then estimate the value of π by computing a number of points, taking the percentage that are inside the circle and multiplying by 4:

: estimate-pi ( points -- pi-estimate )
    0 swap [
        [ random-point inside-circle? [ 1 + ] when ] times
    ] keep /f 4 * ;

We can run this for varying numbers of points and see how it gets more accurate:

IN: scratchpad { 100 1,000 10,000 100,000 1,000,000 10,000,000 }
               [ estimate-pi . ] each
3.2
3.168
3.162
3.15176
3.14288
3.1418212

Tuesday, December 10, 2013

UU Encoding

Just a quick note, as of a couple months ago, Factor has support for uuencoding (and uudecoding)!

You can perform a uuencode:

IN: scratchpad "Factor" string>uu print
begin
&1F%C=&]R
end

...and also a uudecode:

IN: scratchpad """
               begin 644 factor.txt
               &1F%C=&]R
               end
               """ uu>string .
"Factor"

Right now, it operates on text directly and doesn't preserve the file name and permissions in the begin header, but that would be an easy improvement.

The code for this is available in the development version of Factor.

Thursday, December 5, 2013

Humanhash

Zachary Voase published the humanhash project on GitHub, making "human-readable representations of digests". Below is a compatible implementation in Factor.

To show how it will work, we use the example from his humanhash README:

IN: scratchpad CONSTANT: digest "7528880a986c40e78c38115e640da2a1"

IN: scratchpad digest humanhash .
"three-georgia-xray-jig"

IN: scratchpad digest 6 humanhash-words .
"high-mango-white-oregon-purple-charlie"

IN: scratchpad human-uuid4 2array .
{
    "28129036-75a7-4c87-984b-4b32231e0a0d"
    "nineteen-bluebird-oxygen-edward"
}

Implementation

We need a list of 256 words, one to represent each possible byte:

CONSTANT: default-wordlist {
    "ack" "alabama" "alanine" "alaska" "alpha" "angel" "apart"
    "april" "arizona" "arkansas" "artist" "asparagus" "aspen"
    "august" "autumn" "avocado" "bacon" "bakerloo" "batman" "beer"
    "berlin" "beryllium" "black" "blossom" "blue" "bluebird" "bravo"
    "bulldog" "burger" "butter" "california" "carbon" "cardinal"
    "carolina" "carpet" "cat" "ceiling" "charlie" "chicken" "coffee"
    "cola" "cold" "colorado" "comet" "connecticut" "crazy" "cup"
    "dakota" "december" "delaware" "delta" "diet" "don" "double"
    "early" "earth" "east" "echo" "edward" "eight" "eighteen"
    "eleven" "emma" "enemy" "equal" "failed" "fanta" "fifteen"
    "fillet" "finch" "fish" "five" "fix" "floor" "florida"
    "football" "four" "fourteen" "foxtrot" "freddie" "friend"
    "fruit" "gee" "georgia" "glucose" "golf" "green" "grey" "hamper"
    "happy" "harry" "hawaii" "helium" "high" "hot" "hotel"
    "hydrogen" "idaho" "illinois" "india" "indigo" "ink" "iowa"
    "island" "item" "jersey" "jig" "johnny" "juliet" "july"
    "jupiter" "kansas" "kentucky" "kilo" "king" "kitten" "lactose"
    "lake" "lamp" "lemon" "leopard" "lima" "lion" "lithium" "london"
    "louisiana" "low" "magazine" "magnesium" "maine" "mango" "march"
    "mars" "maryland" "massachusetts" "may" "mexico" "michigan"
    "mike" "minnesota" "mirror" "mississippi" "missouri" "mobile"
    "mockingbird" "monkey" "montana" "moon" "mountain" "muppet"
    "music" "nebraska" "neptune" "network" "nevada" "nine"
    "nineteen" "nitrogen" "north" "november" "nuts" "october" "ohio"
    "oklahoma" "one" "orange" "oranges" "oregon" "oscar" "oven"
    "oxygen" "papa" "paris" "pasta" "pennsylvania" "pip" "pizza"
    "pluto" "potato" "princess" "purple" "quebec" "queen" "quiet"
    "red" "river" "robert" "robin" "romeo" "rugby" "sad" "salami"
    "saturn" "september" "seven" "seventeen" "shade" "sierra"
    "single" "sink" "six" "sixteen" "skylark" "snake" "social"
    "sodium" "solar" "south" "spaghetti" "speaker" "spring"
    "stairway" "steak" "stream" "summer" "sweet" "table" "tango"
    "ten" "tennessee" "tennis" "texas" "thirteen" "three" "timing"
    "triple" "twelve" "twenty" "two" "uncle" "undress" "uniform"
    "uranus" "utah" "vegan" "venus" "vermont" "victor" "video"
    "violet" "virginia" "washington" "west" "whiskey" "white"
    "william" "winner" "winter" "wisconsin" "wolfram" "wyoming"
    "xray" "yankee" "yellow" "zebra" "zulu"
}

One of the inputs is the number of words to produce. An error is produced if fewer bytes are provided than the number of words requested:

ERROR: too-few-bytes seq #words ;

: check-bytes ( seq #words -- seq #words )
    2dup [ length ] [ < ] bi* [ too-few-bytes ] when ; inline

Input is grouped into subsequences, where the number of subsequences is the number of words requested in the output. It's a little bit odd, but basically makes groups and then puts any remainder in the last group:

: group-words ( seq #words -- groups )
    [ dupd [ length ] [ /i ] bi* group ]
    [ 1 - cut concat suffix ] bi ; inline

Our input bytes are compressed, first by grouping them into words, then by XORing the bytes in each word:

: compress-bytes ( seq #words -- newseq )
    check-bytes group-words [ 0 [ bitxor ] reduce ] map ;

Our input will either be a byte-array, or a hex-string with every two characters representing the hexadecimal value of each byte:

: byte-string ( hexdigest -- seq )
    dup byte-array? [ 2 <groups> [ hex> ] map ] unless ;

Making a humanhash is simply converting the input, compressing into bytes representing each word, looking up the word from the word list, and joining with a requested separator:

: make-humanhash ( hexdigest #words wordlist sep -- hash )
    { [ byte-string ] [ compress-bytes ] [ nths ] [ join ] } spread ;

We provide a way to hash into a requested number of words, or four by default:

: humanhash-words ( hexdigest #words -- hash )
    default-wordlist "-" make-humanhash ;

: humanhash ( hexdigest -- hash )
    4 humanhash-words ;

And since the humanhash project includes a way to create humanhash'd uuids, we do also:

: human-uuid4 ( -- uuid hash )
    uuid4 dup [ CHAR: - = not ] filter humanhash ;