Thursday, December 18, 2014

Gopher

The Gopher protocol is relatively dated now, but when it was first released in 1991, it had a number of modern features that we would later enjoy in the World Wide Web. In particular, in RFC 1436, it lists these features:

  • A file-like hierarchical arrangement that would be familiar to users.
  • A simple syntax.
  • A system that can be created quickly and inexpensively.
  • Extending the file system metaphor, such as searches.

We're going to build a simple word to let us look through Gopherspace using Factor.

Using the URLs vocabulary, we will build a tool to fetch documents from a Gopher server using a URL that looks like this:

gopher://gopher.floodgap.com/0/gopher/proxy

This specifies a host, an optional port (defaulting to 70 if not specified), and a path which includes an item type and a selector identifying the document to obtain.

Once a network connection is opened, we can retrieve the specified document by sending the selector followed by a CRLF (carriage return and line feed, ASCII bytes 13 and 10 respectively), and then reading the response:

: get-selector ( selector -- document )
   "/" split1 "" or nip write "\r\n" write flush contents ;

: gopher ( url -- document )
    >url {
        [ host>> ]
        [ port>> 70 or <inet> ascii ]
        [ path>> rest ]
    } cleave '[ _ get-selector ] with-client ;

The item type, which we are ignoring in the code above, can be used a bit like a filename extension to handle documents of different types in particular ways. Some common types that you might see:

  • 0 - plain text
  • 1 - menus
  • 9 - binary
  • s - sound
  • g - GIF images

Right now, our code assumes that all the documents we will fetch are ASCII, and it doesn't have any special handling for menus, or support for a query string that would allow using Gopher "search servers". I added some basic support for those items in the new gopher vocabulary that I committed yesterday. In addition, I built a simple Gopher browser complete with history support and ability to view GIF images in the gopher.ui vocabulary.

Here's how you would use it:

IN: scratchpad USE: gopher.ui

IN: scratchpad "gopher://gopher.floodgap.com/1"
               open-gopher-window

That will pop up a window that looks like this, with clickable links and everything:

It's neat using some of these early protocols, both because they tend to be simpler, but especially when you see that they have a passionate following. As of December 2014, Veronica-2 has indexed 150 gopher servers with over 3 million unique selectors. You can see current stats by going to:

gopher://gopher.floodgap.com/0/v2/vstat

Check it out!

Wednesday, December 3, 2014

Binary Puzzle

I've enjoyed being a subscriber to The ListServe, a mailing list where each day one subscriber wins a lottery to write to the entire list of over 24,000 subscribers. There has been a lot of life advice, stories, recipes, music and book recommendations, and even puzzles posted to the list. You can see some of the past posts on The ListServe Blog.

In today's post, someone includes a quick puzzle, which (semi-spoiler alert!) I wanted to show how to solve using Factor:

One preachy puzzle (easy to solve with the aid of the Internet, and therein lies the irony): 01001000 01110101 01101101 01100001 01101110 01110011 00100000 01100001 01110010 01100101 00100000 01101101 01101111 01110010 01100101 00100000 01110100 01101000 01100001 01101110 00100000 01100100 01100001 01110100 01100001

At first glance, it looks like binary numbers separated by spaces in a sequence with some meaning, probably some kind of sentence, probably in English, and probably ASCII encoded.

Let's try and solve it with that in mind:

"""
01001000 01110101 01101101 01100001 01101110 01110011
00100000 01100001 01110010 01100101 00100000 01101101
01101111 01110010 01100101 00100000 01110100 01101000
01100001 01101110 00100000 01100100 01100001 01110100
01100001
"""
[ blank? ] split-when harvest [ bin> ] "" map-as .

It's a neat message, but I won't spoil the answer for you.

Tuesday, December 2, 2014

Heaps

Yesterday, I committed a performance improvement to the heap implementation in Factor.

There's an interesting comment on the pypy implementation of the "heapq" module that discusses a performance optimization that takes advantage of the fact that sub-trees of the heap satisfy the heap invariant. The strategy is to reduce the number of comparisons that take place when sifting items into their proper place in the heap.

Below, I demonstrate the time it takes to run our heaps benchmark and to sort 1 million random numbers using heapsort, before and after making the change.

Before:

IN: scratchpad gc [ heaps-benchmark ] time
Running time: 0.224253523 seconds

IN: scratchpad 1,000,000 random-units gc [ heapsort drop ] time
Running time: 2.210408992 seconds

After:

IN: scratchpad gc [ heaps-benchmark ] time
Running time: 0.172660576 seconds

IN: scratchpad 1,000,000 random-units gc [ heapsort drop ] time
Running time: 1.688299185 seconds

Not a bad improvement!

Friday, November 28, 2014

Prime Sextuplets

A couple of days ago, the Riecoin project (a virtual currency and distributed computing platform) posted a press release announcing they have quietly broken the record for the largest prime number sextuplet:

A prime sextuplet consists of six prime numbers packed together as tightly as possible. For sextuplets, "as tightly as possible" means that the largest is 16 plus the smallest of the numbers.

The smallest prime sextuplet is {7, 11, 13, 17, 19, 23} and generally they take the form of a prime number N such that these six numbers are all prime: {N+0, N+4, N+6, N+10, N+12, N+16}.

It's kind of neat that you can use Factor to confirm their result:

USE: math.primes
689702036532655186685581028503873005405874329363269153979622096014346785019088707220301256048568366498602811964467654774670820091972463194208186476882699386082393716593309811371422836387527549653095824492750394092045532275098135652952423078356472379653908988713872759020566218763497459878106775183203857648413997381256598543877696056491021898353604500233203798629403923570165634119564742536549584121471881689569379964364152289494693118199337926886001843460903637314310532482306798517536171711379098711480663572269535063407688377687623951196977582998449120940358830276897328119483620011984713125859631603652231485340570118364685553782567043880668996080767
{ 0 4 6 10 12 16 } [ + ] with map [ prime? ] all? .

Factor uses an implementation of the probabilistic Miller-Rabin primality test in the math.primes.miller-rabin vocabulary, which on my laptop takes just over 3 seconds.

Saturday, November 22, 2014

Factor Tutorial

Andrea Ferretti has posted a great tutorial about Factor!

From the announcement on the mailing list:

Factor has a lot of documentation in the listener, but I have
tried to cover some topics that are present in the official
docs, but scattered throughout it, so that they were not clear
to me at the beginning.

These include for instance:

- the central concept is function composition, the stack is more
 of a detail
- how simple is to deploy program and scripts
- what tools are there: linter, inspector, unit testing support,
 reverse lookup of function uses...
- what model of multithreading and async I/O are used
- how to make use of multiple cores
- in what sense Factor has an object system
 and more

Check it out!

Sunday, November 2, 2014

Factor 0.97 now available

"If birds can glide for long periods of time, then... why can’t I?" - Orville Wright

I'm very pleased to announce the release of Factor 0.97!

OS/CPUWindowsMac OS XLinux
x86
x86-64

Source code: 0.97

This release is brought to you with over 1,400 commits by the following individuals:

Alex Vondrak, Andrew Pennebaker, Benjamin Pollack, Björn Lindqvist, CW Alston, Doug Coleman, Erik Charlebois, Fred Alger, Iskander Sitdikov, John Benediktsson, Jon Harper, Loryn Jenkins, Paul Woolcock, Roc King, Samuel Tardieu, Steven Stewart-Gallus, and @Profpatsch

Besides some bug fixes and library improvements, I want to highlight the following changes:

Some possible backwards compatibility issues:

  • Fixed mask? in math.bitwise to be more correct
  • Fixed bias in Mersenne Twister random number generator
  • Better support for shebang (no longer need a space after #!)
  • io-error now lives in the libc vocabulary
  • sender stubs in cocoa.messages now named by method signature
  • filter now allocates length of seq, not exemplar.
  • Removed make-assoc in favor of explicit get's.

Some of the improvements to FUEL, Factor's emacs mode:

  • Modernize for emacs 24.3
  • Prepare FUEL to be uploaded to MELPA
  • Change font locking and syntax highlighting
  • Make fuel-help work for vocabularies also
  • New minor mode: fuel-autohelp-mode
  • Fix word help to use correct vocabulary using list
  • Variable controlling whether fuel-mode is loaded automatically
  • Fixes to table rendering

What is Factor

Factor is a concatenative, stack-based programming language with high-level features including dynamic types, extensible syntax, macros, and garbage collection. On a practical side, Factor has a full-featured library, supports many different platforms, and has been extensively documented.

The implementation is fully compiled for performance, while still supporting interactive development. Factor applications are portable between all common platforms. Factor can deploy stand-alone applications on all platforms. Full source code for the Factor project is available under a BSD license.


New libraries:


Improved libraries:

Thursday, October 23, 2014

cURL

The cURL project is a command-line tool and library for transferring data using URL syntax supporting many (many!) protocols. I recently contributed a simple wrapper for libcurl to Factor and wanted to show a little bit about how it was made.

We have a download-to word that uses our HTTP client to download resources from the web. I wanted to show how to build a similar word to download resources using libcurl.

FFI

We will use the alien vocabulary to interface with the libcurl C library, defining words to initialize, perform a request, and cleanup

TYPEDEF: void CURL

FUNCTION: CURL* curl_easy_init ( ) ;

FUNCTION: int curl_easy_perform ( CURL* curl ) ;

FUNCTION: void curl_easy_cleanup ( CURL* curl ) ;

Before we perform the request, we will want to set various options to control what request is made, using function aliases to allow passing different types of values based on the numeric key:

FUNCTION-ALIAS: curl_easy_setopt_long
int curl_easy_setopt ( CURL* curl, int option, long value ) ;

FUNCTION-ALIAS: curl_easy_setopt_string
int curl_easy_setopt ( CURL* curl, int option, c-string value ) 

FUNCTION-ALIAS: curl_easy_setopt_pointer
int curl_easy_setopt ( CURL* curl, int option, void* value ) ;

TYPEDEF: int64_t curl_off_t

FUNCTION-ALIAS: curl_easy_setopt_curl_off_t
int curl_easy_setopt ( CURL* curl, int option, curl_off_t value ) ;

: curl_easy_setopt ( curl option value -- code )
    over enum>number {
        { [ dup 30000 > ] [ drop curl_easy_setopt_curl_off_t ] }
        { [ dup 20000 > ] [ drop curl_easy_setopt_pointer ] }
        { [ dup 10000 > ] [ drop curl_easy_setopt_string ] }
        [ drop curl_easy_setopt_long ]
    } cond ;

Factor

We can then begin to use libcurl in a few simple Factor words that allow us to present a nice interface to the user. Starting with initializing the library, and registering a destructor the cleanup after we are done:

DESTRUCTOR: curl_easy_cleanup

: curl-init ( -- CURL )
    curl_easy_init &curl_easy_cleanup ;

Some of the functions produce an error code that we should check.

CONSTANT: CURLE_OK 0

: check-code ( code -- )
    CURLE_OK assert= ;

We can set options using the curl_easy_setopt words we defined earlier:

: curl-set-opt ( CURL key value -- )
    curl_easy_setopt check-code ;

Using these we can set file (opening and registering a destructor to close) and URL options:

CONSTANT: CURLOPT_FILE 10001
CONSTANT: CURLOPT_URL 10002

DESTRUCTOR: fclose

: curl-set-file ( CURL path -- )
    CURLOPT_FILE swap "wb" fopen &fclose curl-set-opt ;

: curl-set-url ( CURL url -- )
    CURLOPT_URL swap present curl-set-opt ;

And a word to perform the "curl":

: curl-perform ( CURL -- )
    curl_easy_perform check-code ;

Putting all of that together, we can finally download a URL to a specified local file path:

: curl-download-to ( url path -- )
    [
        curl-init
        [ swap curl-set-file ]
        [ swap curl-set-url ]
        [ curl-perform ] tri
    ] with-destructors ;

Using it is pretty simple:

IN: scratchpad "http://factorcode.org" "/tmp/factor.html"
               curl-download-to

Wednesday, June 25, 2014

Quicksort

Sorting algorithms are a frequent element to computer science education, conversation amongst programmers, and job interviews. There are many different versions with varying tradeoffs of performance and technique.

I noticed that Rosetta Code has a page on Quicksort implementations. I thought it might make a nice example of translating pseudocode to Factor.

simple quicksort

The "simple quicksort algorithm" has the following pseudocode:

function quicksort(array)
    less, equal, greater := three empty arrays
    if length(array) > 1  
        pivot := select any element of array
        for each x in array
            if x < pivot then add x to less
            if x = pivot then add x to equal
            if x > pivot then add x to greater
        quicksort(less)
        quicksort(greater)
        array := concatenate(less, equal, greater)

We can copy it verbatim using the ability to have named local variables:

:: quicksort ( seq -- sorted-seq )
    seq length 1 > [
        V{ } clone :> less
        V{ } clone :> equal
        V{ } clone :> greater
        seq first :> pivot
        seq [| x |
            x pivot <=> {
                { +lt+ [ x less push ] }
                { +eq+ [ x equal push ] }
                { +gt+ [ x greater push ] }
            } case
        ] each
        less quicksort equal greater quicksort 3append
    ] [ seq ] if ;

Even though local variables can be convenient, we discourage using them if library words or simpler concepts can express the same logic. Noticing that this partitions the sequence, and then joins the parts, we can make it a bit shorter using some available library words:

: quicksort ( seq -- sorted-seq )
    dup empty? [
      unclip [
          '[ _ before? ] partition [ quicksort ] bi@
      ] keep prefix append
    ] unless ;

Neither of these is particularly fast, since they involve the creation of a lot of temporary sequences. There is a better (meaning faster and not really more complex) version available.

better quicksort

The "better quicksort algorithm" is an in-place version that uses swaps to move items into a sorted order. It has the following pseudocode:

function quicksort(array)
    if length(array) > 1
        pivot := select any element of array
        left := first index of array
        right := last index of array
        while left ≤ right
            while array[left] < pivot
                left := left + 1
            while array[right] > pivot
                right := right - 1
            if left ≤ right
                swap array[left] with array[right]
                left := left + 1
                right := right - 1
        quicksort(array from first index to right)
        quicksort(array from left to last index)

We can take a similar translation approach to the first example (using some unsafe words to avoid bounds-checking and mutable local variables) to create this version:

:: (quicksort) ( seq from to -- )
    from to < [
        from to + 2/ seq nth-unsafe :> pivot
        from :> left!
        to :> right!

        [ left right <= ] [
            [ left seq nth-unsafe pivot before? ]
            [ left 1 + left! ] while
            [ right seq nth-unsafe pivot after? ]
            [ right 1 - right! ] while
            left right <= [
                left right seq exchange-unsafe
                left 1 + left!
                right 1 - right!
            ] when
        ] while

        seq from right (quicksort)
        seq left to (quicksort)
    ] when ; inline recursive

: quicksort ( seq -- )
    0 over length 1 - (quicksort) ;

This is faster, although about 3x slower than our current merge sort algorithm. There are probably ways we could make it faster (one I noticed and filed an issue to track that also makes merge sort faster).

I have committed a version of this in the sorting.quick vocabulary that I hope to use for faster in-place sorting in the standard library.

Sunday, June 22, 2014

World Cup

Many people are watching the FIFA World Cup 2014 that is going on right now in Brazil. A few days ago, someone posted a gist for following the World Cup in six lines of Python 3. Several people tried to improve it, down to four lines, then down to one or two lines of code.

Without worrying too much about lines of code, here is something similar in Factor.

: worldcup. ( -- )
    "http://worldcup.sfg.io/matches" http-get nip json>
    [ "status" of "completed" = ] filter
    [
        [ "home_team" of ] [ "away_team" of ] bi
        [ [ "country" of ] [ "goals" of ] bi ] bi@
        "%s %s x %s %s\n" printf
    ] each ;

And if you run it, you'll get something like this:

IN: scratchpad worldcup.
Brazil 3 x Croatia 1
Mexico 1 x Cameroon 0
Spain 1 x Netherlands 5
Chile 3 x Australia 1
Colombia 3 x Greece 0
Ivory Coast 2 x Japan 1
Uruguay 1 x Costa Rica 3
England 1 x Italy 2
Switzerland 2 x Ecuador 1
France 3 x Honduras 0
Argentina 2 x Bosnia and Herzegovina 1
Iran 0 x Nigeria 0
Germany 4 x Portugal 0
Ghana 1 x USA 2
Belgium 2 x Algeria 1
Russia 1 x Korea Republic 1
Brazil 0 x Mexico 0
Cameroon 0 x Croatia 4
Spain 0 x Chile 2
Australia 2 x Netherlands 3
Colombia 2 x Ivory Coast 1
Japan 0 x Greece 0
Uruguay 2 x England 1
Italy 0 x Costa Rica 1
Switzerland 2 x France 5
Honduras 1 x Ecuador 2
Argentina 1 x Iran 0
Nigeria 1 x Bosnia and Herzegovina 0
Germany 2 x Ghana 2

Extra Credit

If we wanted to engineer this a bit more, we could start adding to the example.

First, we could define a tuple class to hold the result of each game. This isn't really necessary, but it can be nice to see all the fields that are available, and to represent it as an object rather than just a hashtable:

TUPLE: game home_team home_team_events home_team_tbd
away_team away_team_events away_team_tbd winner match_number
datetime location status ;

Then we could get all the game results as tuples, using from-slots to convert from an array of hashtable of attributes:

: worldcup ( -- games )
    "http://worldcup.sfg.io/matches" http-get nip json>
    [ game from-slots ] map ;

Next, having fun with colors, we use character styles to print the winner in bold green text.

CONSTANT: winner-style H{
    { foreground COLOR: MediumSeaGreen }
    { font-style bold }
}

And then, using more code than is probably necessary, we print out each team, making sure to format the winner using the style we just defined (using locals for convenience):

: game. ( game -- )
    [let
        [ home_team>> ] [ away_team>> ] [ winner>> ] tri
            :> ( home away winner )

        home "country" of dup winner =
        [ winner-style format ] [ write ] if bl
        home "goals" of number>string write

        " x " write

        away "country" of dup winner =
        [ winner-style format ] [ write ] if bl
        away "goals" of number>string write nl
    ] ;

We want to see the completed games, so we can make a word to filter the list of games.

: completed-games ( games -- games' )
    [ status>> "completed" = ] filter ;

Finally, putting all this together, we make one word to print out all the completed games:

: worldcup. ( -- )
    worldcup completed-games [ game. ] each ;

The code for this is on my GitHub.

Friday, June 13, 2014

Filename Sanitization

I came across the Zaru project that provides filename sanitization for Ruby. You can learn a bit about filenames reading the article on Wikipedia. I thought it might be a nice feature to implement something like this for Factor.

The rules for sanitization are relatively simple, so I will list and then implement each one:

1. Certain characters generally don't mix well with certain file systems, so we filter them:

: filter-special ( str -- str' )
    [ "/\\?*:|\"<>" member? not ] filter ;

2. ASCII control characters (0x00 to 0x1f) are not usually a good idea, either:

: filter-control ( str -- str' )
    [ control? not ] filter ;

3. Unicode whitespace is trimmed from the beginning and end of the filename and collapsed to a single space within the filename:

: filter-blanks ( str -- str' )
    [ blank? ] split-when harvest " " join ;

4. Certain filenames are reserved on Windows and are filtered (substituting a "file" placeholder name):

: filter-windows-reserved ( str -- str' )
    dup >upper {
        "CON" "PRN" "AUX" "NUL" "COM1" "COM2" "COM3" "COM4"
        "COM5" "COM6" "COM7" "COM8" "COM9" "LPT1" "LPT2" "LPT3"
        "LPT4" "LPT5" "LPT6" "LPT7" "LPT8" "LPT9"
    } member? [ drop "file" ] when ;

5. Empty filenames are not allowed, replaced instead with file:

: filter-empty ( str -- str' )
    [ "file" ] when-empty ;

6. Filenames that begin with only a "dot" character are replaced with file:

: filter-dots ( str -- str' )
    dup first CHAR: . = [ "file" prepend ] when ;

Putting it all together, and requiring the filename to be no more than 255 characters:

: sanitize-path ( path -- path' )
    filter-special
    filter-control
    filter-blanks
    filter-windows-reserved
    filter-empty
    filter-dots
    255 short head ;

The code for this (and some tests) is on my GitHub.

Tuesday, June 10, 2014

Swift Ranges

Looking at the documentation for the Swift programming language recently released by Apple, I noticed they have support for integer ranges, similar to how the range objects work in Factor.

In Swift, you can get a range of the integers 2 through 6 by doing 2...6 and the integers 2 through 5 by doing 2..6. Notice the use of three or two dots to indicate whether the range includes the last number, or not, respectively.

I thought it would be fun to implement a similar syntax for Factor.

First, you can show that:

IN: scratchpad 2 6 [a,b) >array .
{ 2 3 4 5 }

IN: scratchpad 2 6 [a,b] >array .
{ 2 3 4 5 6 }

Similar to how we implemented fat arrows (also known as "pair rockets" or "hash rockets"), we can define the following syntax words:

SYNTAX: .. dup pop scan-object [a,b) suffix! ;

SYNTAX: ... dup pop scan-object [a,b] suffix! ;

And then use them:

IN: scratchpad 2 .. 6 >array .
{ 2 3 4 5 }

IN: scratchpad 2 ... 6 >array .
{ 2 3 4 5 6 }

Comparing k-NN in Factor

Recently a pair of blog posts compared implementations of a k-nearest neighbour (k-NN) classifier in F# and OCaml. Subsequently an implementation showing performance in Rust got my attention and I thought it might be nice to demonstrate a version in Factor.

The first OCaml version is 30 lines of code and takes 21 seconds on my laptop:

$ sloccount classifyDigits.ml
ml:              30 (100.00%)

$ time ./classifyDigits
Percentage correct:94.400000

real 0m21.292s
user 0m21.152s
sys 0m0.120s

The second OCaml version is 47 lines of code and takes 12 seconds:

$ sloccount classifyDigitsArray.ml
ml:              47 (100.00%)

$ time ./classifyDigitsArray 
Percentage correct:94.400000

real 0m12.563s
user 0m12.434s
sys 0m0.120s

Note: I couldn't get the parallel version to run, but would assume it to have the same 2x speedup that the author saw.

Simple

It is often useful to start with the simplest possible code before trying to optimize for performance. I decided to parse the training and validation files (containing comma-separated values, the first of which is the label and the subsequent values are observations) into an array of arrays.

: slurp-file ( path -- {pixels,label} )
    ascii file-lines rest [
        "," split [ string>number ] map unclip 2array
    ] map ;

: classify ( training pixels -- label )
    '[ first _ distance ] infimum-by second ;

: k-nn ( -- )
    "~/trainingsample.csv" slurp-file
    "~/validationsample.csv" slurp-file
    [ [ first2 [ classify ] [ = ] bi* ] with count ]
    [ length ] bi / 100.0 * "Percentage correct: %.1f\n" printf ;

You can see that it produces the desired output of 94.4% correct, and takes about 40 seconds on my laptop.

IN: scratchpad gc [ k-nn ] time
Percentage correct: 94.4
Running time: 40.283777984 seconds

Not too bad for 11 lines of simple code, but slower than it could be. Much of the performance penalty in this version is due to the large amount of generic dispatch, which is something we hope to reduce in future versions of Factor.

Faster

I noticed all the observed values were in the range [0-255], so thought a simple speedup might be to store them in a byte-array, and instead of using the builtin distance word, make my own that specializes on byte-arrays.

: slurp-file ( path -- {pixels,label} )
    ascii file-lines rest [
        "," split [ string>number ] B{ } map-as unclip 2array
    ] map ;

: distance ( x y -- z )
    { byte-array byte-array } declare 0 [ - sq + ] 2reduce ;

: classify ( training pixels -- label )
    '[ first _ distance ] infimum-by second ;

: k-nn ( -- )
    "~/trainingsample.csv" slurp-file
    "~/validationsample.csv" slurp-file
    [ [ first2 [ classify ] [ = ] bi* ] with count ]
    [ length ] bi / 100.0 * "Percentage correct: %f\n" printf ;

With that simple change, we get 7x faster than our previous version and roughly as fast as the fastest parallel OCaml version!

IN: scratchpad gc [ k-nn ] time
Percentage correct: 94.400000
Running time: 5.708627884 seconds

The Rust version requires a nightly build and I haven't had a chance to test it, but I assume it is a bit faster, and discussions on r/rust, r/programming, and Hacker News show some fast versions in C++ and D as well.

The code for this is in my GitHub.

Thursday, June 5, 2014

2 + 2 = 5

There is an old programmer joke that wonders if 2 + 2 = 5 for very large values of 2 (someone even made it into a fun T-shirt).

Well, a challenge on StackExchange to write a program that makes 2 + 2 = 5 caught my eye. I wondered what a solution might look like in Factor.

If you run this code:

IN: scratchpad << "\x32" create-in 5/2 define-constant >>

Then... whoa!

IN: scratchpad 2 2 + .
5

As it turns out, a little bit ago I noticed that you can redefine numbers this way and filed a bug to start a conversation about this "feature".

This exploits the parser, particularly the parse-datum word which searches a token for an already defined word, then if not found, tries to parse it as a number. Usually, we disallow words from being defined by a number using scan-word-name, but that doesn't prevent you from doing it yourself as in the example above.

P.S., in the spirit of the Haskell solution (and anyone else that craves infix notation):

CONSTANT: 2+2 5

IN: scratchpad 2+2 .
5

Monday, June 2, 2014

Pagination

Most of you have used the pagination on various websites, usually in the context of search results or forum posts. I thought it would be fun to build a simple "paginator", using Factor.

For example, if you are on page 23 of 28 total pages, you might see something like this, where you show the selected page and other pages that you can quickly link to:

<< 1 2 ... 21 22 [23] 24 25 ... 27 28 >>

Creating a specification from this, our goal will be to show:

  • the first two pages
  • the selected page (with two pages before and after)
  • the last two pages

Using the output>array smart combinator (and lexical variables), we can generate a sequence of page numbers, filtered to make sure we only allow valid page numbers between 1 and #pages:

:: pages-to-show ( page #pages -- seq )
    [
        1 2 page {
            [ 2 - ]
            [ 1 - ]
            [ ]
            [ 1 + ]
            [ 2 + ]
        } cleave #pages [ 1 - ] keep
    ] output>array members
    [ 1 #pages between? ] filter ;

Some unit tests demonstrate that this works for our "spec" pretty well:

{ { 1 2 3 99 100 } } [ 1 100 pages-to-show ] unit-test
{ { 1 2 21 22 23 24 25 27 28 } } [ 23 28 pages-to-show ] unit-test
{ { 1 2 3 } } [ 1 3 pages-to-show ] unit-test

Lastly, we can split the page numbers to display ellipsis on gaps, and print something like our original goal above:

:: pages-to-show. ( page #pages -- )
    page #pages pages-to-show
    [ swap - 1 = ] monotonic-split { f } join
    [
        [
            [ number>string ]
            [ page = [ "[" "]" surround ] when ] bi
        ] [ "..." ] if*
    ] map " " join "<< " " >>" surround print ;

See, it works!

IN: scratchpad 1 100 pages-to-show.
<< [1] 2 3 ... 99 100 >>

IN: scratchpad 23 28 pages-to-show.
<< 1 2 ... 21 22 [23] 24 25 ... 27 28 >>

IN: scratchpad 1 3 pages-to-show.
<< [1] 2 3 >>

Using this in a web application is left as an exercise for the reader, although it might be nice to create a furnace.pagination vocabulary that automatically handles this in our web framework.

You can find this code on my GitHub.

Thursday, April 24, 2014

Instant-runoff Voting

Recently, I had a conversation with a friend in Australia who told me about the voting system used in most of their elections: instant-runoff voting.

Instead of voting for a single candidate, you rank candidates in the order of preference. This ranking system is used to choose a best candidate.

  1. Count each person's most preferred candidate.
  2. The winning candidate must have more than 50% of the votes.
  3. Otherwise, remove the candidate with the least number of overall votes, and try again.

Let's implement a voting system like this in Factor.

Assuming voters provide an ordered list of candidates, we can count everyone's top candidate:

: count-votes ( votes -- total )
    [ first ] histogram-by sort-values ;

A candidate wins the election if he has a simple majority (more than 50%) of the votes:

: choose-winner ( votes total -- winner/f )
    last first2 rot length 2/ > [ drop f ] unless ;

If the candidate with the most votes did not achieve a majority of the votes, we remove all votes for the candidate with the least number of votes:

: remove-loser ( votes total -- newvotes )
    first first swap [ remove ] with map ;

The full implementation of our instant-runoff voting system:

: instant-runoff ( votes -- winner )
    dup count-votes 2dup choose-winner
    [ 2nip ] [ remove-loser instant-runoff ] if* ;

One improvement we could make would be to support versions of this model that do not require voters to rank all the candidates (an assumption that the code above makes).

The code for this is on my GitHub.

Tuesday, April 22, 2014

Checksum Improvements

Just a quick update, some checksum improvements have been contributed to Factor.

Some new checksums have been implemented:

And some checksum performance has been improved:

  • checksums.md5 is a lot faster (benchmark is 0.080 vs 0.583 seconds)
  • checksums.sha is a bit faster (benchmark is 0.418 vs 0.686 seconds)

You can find these changes (and more!) in the development version of Factor.

Monday, April 21, 2014

Scraping Re: Factor

For today's post, I thought it would be fun to build a little interface to my blog, Re: Factor. In addition, you can use this to easily scrape any Blogger website.

Scraping

A simple way to make URLs that are relative to my blogs domain:

: re-factor-url ( str -- url )
    "http://re-factor.blogspot.com/" prepend ;

Using that to get a URL that returns all of the posts as JSON objects.

: posts-url ( -- url )
    "feeds/posts/default?alt=json&max-results=200" re-factor-url ;
Note: we limit the results to 200 posts. Since this is my 190th post, that will work for a little while longer but that limit might need to be bumped up in the future. :-)

Retrieving all of the posts is easy using our HTTP client and parsing the response. Since my posts don't change that frequently, for convenience we will memoize the list result.

MEMO: all-posts ( -- posts )
    posts-url http-get nip json> { "feed" "entry" } [ of ] each ;

Displaying

A simple way to display a list of posts is to display the title of each post and link it to the URL of each post (allowing us to right-click open URLs in the listener).

CONSTANT: post-style H{ { foreground COLOR: blue } }

: posts. ( -- )
    all-posts [
        [ "title" of "$t" of ] [ "link" of ] bi
        over '[ "title" of _ = ] find nip "href" of
        >url post-style [ write-object ] with-style nl
    ] each ;

For individual posts, we will use the html.parser.printer vocabulary to parse the HTML content and display it as text. The conversion to text right now is not perfect, but works okay for most things.

We print the title of the post and a dashed line underneath:

: post-title. ( post -- )
    { "title" "$t" } [ of ] each
    [ print ] [ length CHAR: - <string> print ] bi nl ;

We print the content by rendering the HTML into a string of text, then cleaning up extra whitespace and HTML escapes (using the new html.entities vocabulary), and wrapping the paragraphs.

: post-content. ( post -- )
    { "content" "$t" } [ of ] each
    parse-html html-text html-unescape string-lines [
        [ blank? not ] cut-when
        [ write ] [ 70 wrap-string print ] bi*
    ] each ;

Putting those together, to display a post:

: post. ( n -- )
    all-posts nth [ post-title. ] [ post-content. ] bi ;

The code for this is on my GitHub.

Saturday, April 5, 2014

Speedtest

Many people are familiar with Speedtest.net, which is used to test a network connection, displaying download speeds, upload speeds, and server latency. Implemented as a Flash-based interface, it can be used from a web browser to verify your internet provider is giving you what you pay for.

You might not be aware that the speedtest-cli project provides a way to check internet speed from the command line in a similar manner.

I thought it might be fun to implement an interface to Speedtest.net using Factor:

Closest Servers

Speedtest provides a list of available servers all over the world that can be used for testing, returned as XML. After parsing the XML document, we use a utility method to extract attributes for each server into an array:

: attr-map ( tag -- attrs )
    attrs>> [ [ main>> ] dip ] H{ } assoc-map-as ;

: speedtest-servers ( -- servers )
    "http://www.speedtest.net/speedtest-servers.php"
    http-get nip string>xml
    "server" deep-tags-named [ attr-map ] map ;

Calculating the geographical distance between two points, specified by latitude and longitude:

: radians ( degrees -- radians ) pi * 180 /f ; inline

:: geo-distance ( lat1 lon1 lat2 lon2 -- distance )
    6371 :> radius ! km
    lat2 lat1 - radians :> dlat
    lon2 lon1 - radians :> dlon
    dlat 2 / sin sq dlon 2 / sin sq
    lat1 radians cos lat2 radians cos * * + :> a
    a sqrt 1 a - sqrt fatan2 2 * :> c
    radius c * ;

This lets us find the closest server to a given geographic location:

: lat/lon ( assoc -- lat lon )
    [ "lat" of ] [ "lon" of ] bi [ string>number ] bi@ ;

: server-distance ( server lat lon -- server )
    '[ lat/lon _ _ geo-distance "distance" ] keep
    [ set-at ] keep ;

: closest-servers-to ( lat lon -- servers )
    [ speedtest-servers ] 2dip '[ _ _ server-distance ] map
    [ "distance" of ] sort-with ;

The available Speedtest configuration provides our latitude and longitude, allowing us to sort the server list by geographic distance:

TUPLE: config client times download upload ;

C: <config> config

: speedtest-config ( -- config )
    "http://www.speedtest.net/speedtest-config.php"
    http-get nip string>xml {
        [ "client" deep-tag-named attr-map ]
        [ "times" deep-tag-named attr-map ]
        [ "download" deep-tag-named attr-map ]
        [ "upload" deep-tag-named attr-map ]
    } cleave <config> ;

: closest-servers ( -- servers )
    speedtest-config client>> lat/lon closest-servers-to ;

Best Server

We can calculate latency by downloading a small latency.txt file and timing how long it takes:

: (server-latency) ( server -- ms )
    "url" of >url URL" latency.txt" derive-url
    [ http-get nip "test=test\n" = ] benchmark 1,000,000 /f
    3,600,000 ? ;

After calculating latency, we save it for later use:

: server-latency ( server -- server )
    [ (server-latency) "latency" ] keep [ set-at ] keep ;

The "best" server that we will use for testing is the one with the lowest latency, checking the five closest servers to our location:

: best-server ( -- server )
    closest-servers 5 short head
    [ server-latency ] parallel-map
    [ "latency" of ] sort-with first ;

Upload Speed

To calculate upload speed, we upload several document sizes (filling the content with zeroes) and time how long it takes:

: upload-data ( size -- data )
    9 - CHAR: 0 <string> "content1=" prepend ;

: (upload-speed) ( server -- Mbps )
    "url" of >url { 250,000 500,000 } [
        [
            upload-data [ swap http-put 2drop ] keep length
        ] with map-sum
    ] benchmark 1,000,000,000 /f / 8 * 1,000,000 / ;

After calculating upload speed, we save it for later use:

: upload-speed ( server -- server )
    [ (upload-speed) "upload" ] keep [ set-at ] keep ;

Download Speed

To calculate download speed, we download several files with varying sizes in parallel and time how long it takes:

: download-urls ( server -- urls )
    "url" { 350 500 750 1000 } 
    [ dup "random%sx%s.jpg" sprintf >url derive-url ] with map ;

: (download-speed) ( server -- Mbps )
    download-urls 4 swap <array> [
        [ [ http-get nip length ] map-sum ] parallel-map sum
    ] benchmark 1,000,000,000 /f / 8 * 1,000,000 / ;

After calculating download speed, we save it for later use:

: download-speed ( server -- server )
    [ (download-speed) "download" ] keep [ set-at ] keep ;

Text Results

With all of that built, we can build a word to run a Speedtest, printing out the results as text:

: run-speedtest ( -- server )
    "Selecting best server based on ping..." print flush
    best-server dup {
        [ "sponsor" of ]
        [ "name" of ]
        [ "distance" of ]
        [ "latency" of ]
    } cleave "Hosted by %s (%s) [%0.2f km]: %s ms\n" printf
    "Testing download speed" print flush download-speed
    dup "download" of "Download: %0.2f Mbit/s\n" printf
    "Testing upload speed" print flush upload-speed
    dup "upload" of "Upload: %0.2f Mbit/s\n" printf ;

Graphic Results

It would be nice if we could show the reports graphically, and as it turns out, its not too hard. We just have to upload the results to speedtest.net in the same way their Flash application does, and then display the image that is created for you.

: make-result ( server -- result )
    [
        {
            [ "download" of 1,000 * >integer "download" ,, ]
            [ "latency" of >integer "ping" ,, ]
            [ "upload" of 1,000 * >integer "upload" ,, ]
            [ drop "" "promo" ,, ]
            [ drop "pingselect" "startmode" ,, ]
            [ "id" of "recommendedserverid" ,, ]
            [ drop "1" "accuracy" ,, ]
            [ "id" of "serverid" ,, ]
            [
                [ "latency" of ]
                [ "upload" of 1,000 * ]
                [ "download" of 1,000 * ] tri
                "%d-%d-%d-297aae72" sprintf md5 checksum-bytes
                hex-string "hash" ,,
            ]
        } cleave
    ] { } make ;

: submit-result ( server -- result-id )
    make-result "http://www.speedtest.net/api/api.php"
    <post-request> [
        [
            "http://c.speedtest.net/flash/speedtest.swf"
            "referer"
        ] dip header>> set-at
    ] keep http-request nip query>assoc "resultid" of ;

Speedtest

Putting this all together, we can run the Speedtest, submit the results, then display the test results as an image.

: speedtest ( -- )
    run-speedtest submit-result "Share results: " write
    "http://www.speedtest.net/result/%s.png" sprintf
    [ dup >url write-object nl ] [ http-image. ] bi ;

Some things that I would like to improve:

  • The Speedtest configuration actually specifies the details of download and upload sizes, the amount of parallelism, and the duration of the test, we should use it.
  • The http-get word needs an overall timeout so we can scale between very slow and very fast connection speeds.
  • The Speedtest graphical result images are "retina" when viewed in the web browser, but are not when downloaded from Factor or wget.
  • Factor needs an easier way to create a queue of work that is processed by several worker threads, for convenience I just used one of the concurrent combinators.

The code for this is on my GitHub.