Tuesday, May 17, 2011

Fibonacci Wars

Inspired by a recent article about the "worst algorithm in the world" for calculating Fibonacci numbers using Python, I wanted to show a few implementations in Factor.

Recursive

Most people should be familiar with the simple definition for calculating the Fibonacci numbers:

Some recursive implementations of this algorithm suffer from very slow performance due to the nature of hugely recursive functions, making them unsuitable for larger Fibonacci numbers. Thankfully, it is easy to implement caching using the memoize vocabulary:

MEMO: slow-fib ( m -- n )
    dup 0 >= [ throw ] unless
    dup 2 >= [
        [ 2 - slow-fib ] [ 1 - slow-fib ] bi +
    ] when ;

Iterative

A better way would be to implement an iterative solution, that simply keeps the current and previous Fibonacci number on the stack, adding them the specified number of times:

: okay-fib ( m -- n )
    dup 0 >= [ throw ] unless
    [ 0 1 ] dip [ [ + ] [ drop ] 2bi ] times drop ;

Magical

A much better way (introduced by the original article) suggests taking advantage of a matrix expression for the Fibonacci numbers:

Using this, we can implement a faster version in Factor (using locals to make it a bit more readable):

:: fast-fib ( m -- n )
    m 0 >= [ m throw ] unless
    m 2 >base [ CHAR: 1 = ] { } map-as :> bits
    1 :> a! 0 :> b! 1 :> c!
    bits [
        [
            a c + b *
            b sq c sq +
        ] [
            a sq b sq +
            a c + b *
        ] if b! a! a b + c!
    ] each b ;

Performance

Some simple performance numbers show how much faster the "magical" version is, particularly for calculating large Fibonacci numbers (such as the the one-millionth number in the sequence).

10,000 100,000 1,000,000
slow-fib 0.0097 -- --
okay-fib 0.0053 0.2560 25.9608
fast-fib 0.0001 0.0076 0.4851

It's worth pointing out that Factor is about as fast as Python 2.6 for the "magical" calculation. In Python 2.7 and later, certain improvements were made to their large number support, resulting in Python being about 3x faster.

However, if you want real performance, you can use C and the GNU Multiple Precision Arithmetic Library to implement the iterative algorithm (about 3 times faster than Factor), the magical algorithm (about 28 times faster than Factor), or use the builtin mpz_fib_ui function (about 44 times faster than Factor).

The code for this (as well as several version in C) are on my Github.

Sunday, May 15, 2011

Google Search

A couple months ago, I implemented wrappers for the Google Charts and Google Translate API's. Today, I'd like to implement a wrapper for Google Search.

Search

First, we should build a word that, given a query, returns a URL to retrieve Google Search results (formatted as JSON):

: search-url ( query -- url )
    URL" http://ajax.googleapis.com/ajax/services/search/web"
        "1.0" "v" set-query-param
        swap "q" set-query-param
        "8" "rsz" set-query-param
        "0" "start" set-query-param ;

We can define a tuple class to hold the attributes every search result should have:

TUPLE: search-result cacheUrl GsearchResultClass visibleUrl
title content unescapedUrl url titleNoFormatting ;

Using some code to set attributes dynamically, we can perform a Google Search, and parse the results into a sequence of search-result objects.

: http-search ( query -- results )
    search-url http-get nip json>
    { "responseData" "results" } [ swap at ] each
    [ \ search-result from-slots ] map ;

Display

We can build some simple words that can be used to format the output of a search:

: write-heading ( str -- )
    H{ 
        { font-size 14 }
        { background COLOR: light-gray }
    } format nl ;

: write-title ( str -- )
    H{ 
        { foreground COLOR: blue }
    } format nl ;

: write-content ( str -- )
    60 wrap-string print ;

: write-url ( str -- )
    dup >url H{ 
        { font-name "monospace" }
        { foreground COLOR: dark-green }
    } [ write-object ] with-style nl ;

And then create a word to perform a search and display the results. If you are using my webbrowser vocabulary, you can open the URL's in a webbrowser directly from Factor.

: http-search. ( query -- )
    [ "Search results for '%s'" sprintf write-heading nl ]
    [ http-search ] bi [
        {
            [ titleNoFormatting>> write-title ]
            [ content>> write-content ]
            [ unescapedUrl>> write-url ]
        } cleave nl
    ] each ;

Try it out

If everything works correctly, you should be able to perform a search in the Listener (e.g., searching for "factor"):


Some things we might do to improve this:

  • add paging, to the next or previous page of search results
  • highlight in bold the searched words in the content
  • unescape the HTML entities in the content
  • build a lightweight GUI to render the results

The code for this is on my Github.

Thursday, May 5, 2011

Open URL (in Listener)

In January, I implemented a cross-platform open-url feature to allow opening URLs from Factor. One major problem was that I couldn't figure out how to make URLs "clickable" in the Listener.

Then, I learned that the ui.operations vocabulary allows us to add "capabilities" that get triggered when certain types are rendered in the Listener. It turns out to be pretty easy, actually:

[ url? ] \ open-url H{ } define-operation

Using that, you can right-click on URL objects, and choose to open them in your web browser:


I also made a small patch to add this support to URLs that are rendered in the help system or in the slides vocabulary.

Note: if you want this operation to be the "primary operation" triggered when the URL is left-clicked, you can add { +primary+ t } to the hashtable used in the operation's definition.

Sunday, May 1, 2011

Reddit Stats

I thought it would be fun to track the development of the Factor programming language using Reddit. Of course, to do this, I wanted to use Factor.

A few months ago, I implemented a Reddit "Top" program. We can use the code for that as a basis for showing, using Reddit scores, how the activity (or popularity?) of a couple Factor blogs have changed over the last few years:


Stories

We start by creating a story tuple that will hold all of the properties that Reddit returns for each posted link (including the score -- up votes minus down votes -- which we will be using).

TUPLE: story author clicked created created_utc domain downs
hidden id is_self levenshtein likes media media_embed name
num_comments over_18 permalink saved score selftext
selftext_html subreddit subreddit_id thumbnail title ups url ;

And a word to parse a "data" result into a story object:

: parse-story ( assoc -- obj )
    "data" swap at \ story from-slots ;

Paging

The Reddit API provides results as a series of "pages" (until all results are exhausted). This is pretty similar to how the website works, so it should be fairly easy to understand the mechanics. We will define a page object that holds the current URL, the results from the current page, and links to the pages before and after the current page:

TUPLE: page url results before after ;

We can then build a word to fetch the page (as a JSON response), parse it, and construct a page object:

: json-page ( url -- page )
    >url dup http-get nip json> "data" swap at {
        [ "children" swap at [ parse-story ] map ]
        [ "before" swap at [ f ] when-json-null ]
        [ "after" swap at [ f ] when-json-null ]
    } cleave \ page boa ;

An easy "paging" mechanism that, given a page of results, can return the next page:

: next-page ( page -- page' )
    [ url>> ] [ after>> "after" set-query-param ] bi json-page ;

And, using the make vocabulary, construct all the results into a sequence:

: all-pages ( page -- results )
    [
        [ [ results>> , ] [ dup after>> ] bi ]
        [ next-page ] while drop
    ] { } make concat ;

Domain Stats

We can retrieve all the results and produces a map of years that links were posted to a sum of the scores for each year's links:

: domain-stats ( domain -- stats )
    "http://api.reddit.com/domain/%s" sprintf json-page all-pages
    [ created>> 1000 * millis>timestamp year>> ] group-by
    [ [ score>> ] map-sum ] assoc-map ;

Finally, a word to display a chart of the scores across a given list of domains:

: domains. ( domains -- )
    H{ } clone [
        '[ domain-stats [ swap _ at+ ] assoc-each ] each
    ] keep >alist sort-keys <bar> 160 >>height chart. ;

The code for this is available on my Github.