Monday, November 29, 2010

Estimating CPU Speed

Factor contains a nice DSL for writing assembly code. I thought it would be fun to investigate how it works by accessing the CPU's Time Stamp Counter to estimate CPU speed.

The X86 instruction for accessing the timestamp value (incremented every CPU tick) is called RDTSC (a 2-byte instruction 0x0f 0x31). Some C code for calling the 32-bit or 64-bit versions of this looks like:

#if defined(__i386__)

static __inline__ unsigned long long rdtsc(void)
{
    unsigned long long int x;
    __asm__ __volatile__ (".byte 0x0f, 0x31" : "=A" (x));
    return x;
}

#elif defined(__x86_64__)

static __inline__ unsigned long long rdtsc(void)
{
    unsigned long long hi, lo;
    __asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
    return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 );
}

#endif

Factor provides utilities for calling arbitrary assembly code in the alien vocabulary. Using this, we can create corresponding Factor code (supporting both 32 and 64 bits):

USING: alien alien.c-types cpu.x86.assembler
cpu.x86.assembler.operands system ;

HOOK: rdtsc cpu ( -- n )

M: x86.32 rdtsc
    longlong { } cdecl [
        RDTSC
    ] alien-assembly ;

M: x86.64 rdtsc
    longlong { } cdecl [
        RAX 0 MOV
        RDTSC
        RDX 32 SHL
        RAX RDX OR
    ] alien-assembly ;

You can see in the implementation above how Factor uses the type of CPU (contained in the cpu variable) to dispatch on the correct version of the rdtsc word.

To estimate CPU speed, we will need to define two "benchmarking" words:

  1. Calculate the number of CPU ticks it takes to execute some Factor code.
  2. Calculate the time it takes to execute some Factor code.
USING: kernel math system ;

: #ticks ( quot -- n )
    rdtsc [ call rdtsc ] dip - ; inline

: #nanos ( quot -- n )
    nano-count [ call nano-count ] dip - ; inline

We can then create a "busy loop" that runs for some time, then estimates CPU speed as ticks-per-second:

: busy-loop ( -- )
    100000000 [ 1 - dup 0 > ] loop drop ;

: cpu-speed ( -- n )
    [ [ busy-loop ] #nanos ] #ticks swap / 1000000000.0 * ;

Running this on my MacBook Pro (with a 2.66 GHz processor) produces this estimate:

( scratchpad ) cpu-speed .
2660324566.190773

The rdtsc and #ticks words are distributed with Factor as instruction-count and count-instructions and are available in the cpu.x86.features vocabulary. The #nanos word is called benchmark and is available in the tools.time vocabulary.

Wednesday, November 17, 2010

Github supports Factor!

I've been working with Github over the past three months to get Factor added as a recognized programming language. The main roadblock was proper syntax highlighting and deployment within the Github environment. Since Github uses the pygments project, I implemented (and contributed) a Factor lexer which will be part of the next release of pygments (version 1.4) and is available now on their development branch.

As of today, Github has officially added support for the Factor programming language. In celebration of this, I wanted to highlight some of the projects using Factor on Github. As you might know, I've been keeping code examples and tutorials from my blog posts on my Github.

The main repository (where a lot of the library development occurs) has 114 watches, 47 forks, and at least 17 unofficial copies.

There are a variety of other projects and tutorials:

Trixel (low resolution voxel renderer)

Implementation of Joy in Factor

Factor Blog Tutorial

99 Prolog Problems in Factor

Factor Training Ground

Sunlight Labs API in Factor

BERT in Factor

Static Blog Generator

Go implemented in Factor

Controlling a Robotino from Factor

Math and crypto code.

Sensors and Radar

Turtles, SVG, Image gadgets

Graphing gadgets

Facile: a CouchDB view server in Factor

Some patches (probably a little outdated):

New regexp minimalization code for Factor

Korean I/O support.

X input method

And, some related projects:

Packaging Factor for ArchLinux

Port of Factor to the Parrot Virtual Machine

If you know of other Factor projects with source code, let me know!

By the way, repositories get classified on push, so as the Factor projects get pushed to, they will be added to the list.

Thursday, October 28, 2010

Syntax Highlighting

Update: I modified the code examples below to use the new colors.hex vocabulary.

It's a sometimes overlooked feature that Factor contains a syntax highlighting vocabulary that supports a decent number of programming languages. The vocabulary is called xmode, and is used to highlight entries submitted to Factor's pastebin.

I thought it would be fun to use xmode to implement syntax highlighting in the listener. When we're done, it will look something like this:


First, we will need to use several vocabularies:

USING: accessors assocs colors io io.encodings.utf8 io.files
io.styles kernel literals locals math math.parser sequences
xmode.catalog xmode.marker ;

We define named styles that will apply to the tokens that are parsed by the syntax highlighter. For convenience, we will use the colors.hex vocabulary to convert hexadecimal values to color tuples to build the stylesheet. In the stylesheet below, we use the same colors that are used in the pastebin.

CONSTANT: STYLES H{
    { "NULL"     H{ { foreground HEXCOLOR: 000000 } } }
    { "COMMENT1" H{ { foreground HEXCOLOR: cc0000 } } }
    { "COMMENT2" H{ { foreground HEXCOLOR: ff8400 } } }
    { "COMMENT3" H{ { foreground HEXCOLOR: 6600cc } } }
    { "COMMENT4" H{ { foreground HEXCOLOR: cc6600 } } }
    { "DIGIT"    H{ { foreground HEXCOLOR: ff0000 } } }
    { "FUNCTION" H{ { foreground HEXCOLOR: 9966ff } } }
    { "INVALID"  H{ { background HEXCOLOR: ffffcc }
                    { foreground HEXCOLOR: ff0066 } } }
    { "KEYWORD1" H{ { foreground HEXCOLOR: 006699 }
                    { font-style bold } } }
    { "KEYWORD2" H{ { foreground HEXCOLOR: 009966 }
                    { font-style bold } } }
    { "KEYWORD3" H{ { foreground HEXCOLOR: 0099ff }
                    { font-style bold } } }
    { "KEYWORD4" H{ { foreground HEXCOLOR: 66ccff }
                    { font-style bold } } }
    { "LABEL"    H{ { foreground HEXCOLOR: 02b902 } } }
    { "LITERAL1" H{ { foreground HEXCOLOR: ff00cc } } }
    { "LITERAL2" H{ { foreground HEXCOLOR: cc00cc } } }
    { "LITERAL3" H{ { foreground HEXCOLOR: 9900cc } } }
    { "LITERAL4" H{ { foreground HEXCOLOR: 6600cc } } }
    { "MARKUP"   H{ { foreground HEXCOLOR: 0000ff } } }
    { "OPERATOR" H{ { foreground HEXCOLOR: 000000 }
                    { font-style bold } } }
}

The xmode.catalog vocabulary provides support for looking up the type (or "mode") of the file. The xmode.marker vocabulary then provides support for converting each line into a stream of tokens. Each token allows access to a named style. Once we have the name of the appropriate style, we can then look it up and format the output.

Putting that all together, we can implement the highlight. word.

: highlight-tokens ( tokens -- )
    [
        [ str>> ] [ id>> ] bi
        [ name>> STYLES at ] [ f ] if* format
    ] each nl ;

: highlight-lines ( lines mode -- )
    [ f ] 2dip load-mode [
        tokenize-line highlight-tokens
    ] curry each drop ;

:: highlight. ( path -- )
    path utf8 file-lines [
        path over first find-mode highlight-lines
    ] unless-empty ;

You should be able to paste the above code examples into your listener, to try it for yourself.

Monday, October 25, 2010

Trending Github Projects

This weekend someone posted an article describing using Python with YQL to parse Repopular to retrieve a list of popular Github projects. Since mashups are all the rage these days, I thought I would implement this in Factor.


The Yahoo! way

The YQL that was used in the original article is:

use 'http://yqlblog.net/samples/data.html.cssselect.xml'
  as data.html.cssselect;

select * from data.html.cssselect
where url="repopular.com"
  and css="div.pad a"

We can use the http.client to send this query to Yahoo!, parse the returned JSON data using json.reader, and extract the HREFs to the popular projects. We can then filter them for the links which point to Github.

USING: assocs http.client json.reader kernel sequences ;

: the-yahoo-way ( -- seq )
    "http://query.yahooapis.com/v1/public/yql?q=use%20'http%3A%2F%2Fyqlb
log.net%2Fsamples%2Fdata.html.cssselect.xml'%20as%20data.html.cssselect%
3B%20select%20*%20from%20data.html.cssselect%20where%20url%3D%22repopula
r.com%22%20and%20css%3D%22div.pad%20a%22&format=json&diagnostics=true&ca
llback="
    http-get nip json> { "query" "results" "results" "a" }
    [ swap at ] each [ "href" swap at ] map 
    [ "http://github.com" head? ] filter ;

We can run this to see what the current trending projects are on Github.

( scratchpad ) the-yahoo-way [ . ] each
"http://github.com/sinatra/sinatra"
"http://github.com/Sutto/barista"
"http://github.com/pypy/pypy"
"http://github.com/dysinger/apparatus"
"http://github.com/videlalvaro/Thumper"
"http://github.com/alunny/sleight"
"http://github.com/vimpr/vimperator-plugins"

The other way

We can use the html.parser vocabulary to do it another way. Given some knowledge of the HTML returned by Repopular, we can extract the HREFs directly:

USING: accessors assocs html.parser http.client kernel sequences ;

: the-other-way ( -- seq )
    "http://repopular.com" http-get nip parse-html
    [ [ name>> "aside" = ] find drop ]
    [ [ name>> "aside" = ] find-last drop ]
    [ <slice> ] tri
    [ name>> "a" = ] filter
    [ attributes>> "href" swap at ] map
    [ "http://github.com" head? ] filter ;

We can see it produces the same results:

( scratchpad ) the-other-way [ . ] each
"http://github.com/sinatra/sinatra"
"http://github.com/Sutto/barista"
"http://github.com/pypy/pypy"
"http://github.com/dysinger/apparatus"
"http://github.com/videlalvaro/Thumper"
"http://github.com/alunny/sleight"
"http://github.com/vimpr/vimperator-plugins"

Sunday, October 24, 2010

Longest Palindrome

Greplin issued a programming challenge recently, where the first question involved finding the longest palindrome substring. It was then posted as a challenge to the Programming Praxis blog, and I thought I would contribute a solution in Factor.

First, some vocabularies that we will be using:

USING: fry kernel locals make math.ranges sequences unicode.case
unicode.categories ;

As part of the Factor language tutorial, the first program many people write in Factor is a word for detecting if something is a palindrome. The implementation of palindrome? (extended to support case-insensitive comparisons using the unicode vocabulary) looks like this:

: normalize ( str -- str' ) [ Letter? ] filter >lower ;

: palindrome? ( str -- ? ) normalize dup reverse = ;

The "obvious" (but not that fast) way to solve the problem is to examine every possible substring, adding to a list if it is a palindrome. The list of palindrome substrings can then be used to answer the question. This is how we'll implement it.

I thought it would be useful to split the problem into two steps. First, we need a way to enumerate all possible substrings (not including the "empty" substring), applying a quotation to each in turn.

:: each-subseq ( ... seq quot: ( ... x -- ... ) -- ... )
    seq length [0,b] [
        :> from
        from seq length (a,b] [
            :> to
            from to seq subseq quot call( x -- )
        ] each
    ] each ;

You can try this out in the listener, to see how it works:

( scratchpad ) "abc" [ . ] each-subseq
"a"
"ab"
"abc"
"b"
"bc"
"c"

Once we have that, it's pretty easy to build a word to look for palindrome substrings:

: palindromes ( str -- seq )
    [
        [ dup palindrome? [ , ] [ drop ] if ] each-subseq
    ] { } make ;

We can use the longest word that I implemented for my anagrams post to find the longest palindrome substring:

: longest ( seq -- subseq )
    dup 0 [ length max ] reduce '[ length _ = ] filter ;
Using this on the 1169-character string from the original challenge, we find 52 unique palindromes of 2 or more characters. The longest palindrome substring I found was a 7-character sequence.

Friday, October 22, 2010

Presentations in Factor

Just a few days ago, Daniel Ehrenberg gave a presentation at the Dynamic Language Symposium 2010 in Reno, Nevada. It was quite good, and gave a nice overview of the different language features that Factor supports.

Like many of the talks about Factor, it was made (and presented!) using Factor. Specifically, the slides vocabulary, which is a simple DSL for presentations. To show how it works, let's just jump directly to an example:

( scratchpad ) USING: help.markup slides ;

( scratchpad ) {
                   { $slide "Factor!"
                       { $url "http://factorcode.org" }
                       "Development started in 2003"
                       "Open source (BSD license)"
                       "Influenced by Forth, Lisp, and Smalltalk"
                       "Blurs the line between language and library"
                       "Interactive development"
                   }
               } slides-window

When run, this pops up a window that can be used for presentations. It will look something like this:


You can control the slideshow window using the keyboard:

  • Press DOWN to move to the next slide.
  • Press UP to move to the previous slide.
  • Press F to toggle full-screen mode.

In addition, the slides vocabulary supports some other features, such as embedding code and clickable links to vocabularies or words. Here is an example which does both of these:

( scratchpad ) {
                   { $slide "Code!"
                       "Try clicking on these:"
                       { $code "2 2 +" }
                       { $vocab-link "sequences" }
                       { $link nth }
                   }
               } slides-window

If you want to customize the look-and-feel of the presentation, you need to modify the stylesheet used to render the presentation.

To see it in action, you can watch Slava Pestov give a Google Tech Talk on Factor. The slides he used are available in the Factor source tree under extra/google-tech-talk.

Slides for most of the other talks that have been given are also available (they are a nice way to get a quick overview of the "big features" in Factor):

It's so cool, even Zed Shaw gave a presentation using the slides vocabulary.

Friday, October 15, 2010

Text-to-PDF

In this article, we will be building step-by-step a program for converting text files into PDF. The PDF specification is at version 1.7 (approximately 750 pages plus some supplements) and is available for download from the Adobe website.

The entire solution listed below is approximately 140 lines of code, and compares favorably to a 600 line Python version and a 450 line C version.

First, we will need a list of imports and a namespace:

USING: assocs calendar combinators environment formatting
grouping io io.files kernel make math math.ranges sequences
splitting xml.entities ;

IN: text-to-pdf

PDF files are essentially text files that contain numbered objects with display instructions and a cross-reference table that allows random access to those objects. We will need a word to create an object from its contents and number (e.g., start with n 0 obj and end with endobj):

: pdf-object ( str n -- str' )
    "%d 0 obj\n" sprintf "\nendobj" surround ;

References (used to provide pointers to objects) are of the form n 0 R where n is the number of the object being referred to.

The PDF standard has support for various data types, one of which are text strings. These strings require escaping to be able to include certain characters (and are surrounded by parentheses):

: pdf-string ( str -- str' )
    H{
        { HEX: 08    "\\b"  }
        { HEX: 0c    "\\f"  }
        { CHAR: \n   "\\n"  }
        { CHAR: \r   "\\r"  }
        { CHAR: \t   "\\t"  }
        { CHAR: \\   "\\\\" }
        { CHAR: (    "\\("  }
        { CHAR: )    "\\)"  }
    } escape-string-by "(" ")" surround ;

Hashtables of key/value properties are used frequently within the specification and are essentially space-separated key/value pairs surrounded by << and >>.

Our PDF file will start with an "info" section that contains a creation timestamp, author, and software details:

: pdf-info ( -- str )
    [
        "<<" ,
        "/CreationDate D:" now "%Y%m%d%H%M%S" strftime append ,
        "/Producer (Factor)" ,
        "/Author " "USER" os-env "unknown" or pdf-string append ,
        "/Creator (created with Factor)" ,
        ">>" ,
    ] { } make "\n" join ;

We will follow it with a catalog indicating which object (by reference) contains the list of pages.

: pdf-catalog ( -- str )
    {
        "<<"
        "/Type /Catalog"
        "/Pages 4 0 R"
        ">>"
    } "\n" join ;

As is typical of "text2pdf" programs, we will use the Courier monospace font. We can just refer to it by name since it is one of the fonts directly supported by the PDF specification.

: pdf-font ( -- str )
    {
        "<<"
        "/Type /Font"
        "/Subtype /Type1"
        "/BaseFont /Courier"
        ">>"
    } "\n" join ;

Each page is composed of two objects: a resource object and a contents object. We convert the number of pages into an index specifying the canvas size (US Letter - "612 by 792 points" or "8.5 by 11 inches") and a list of pages (or "kids") specified as references to each pages' resource object:

: pdf-pages ( n -- str )
    [
        "<<" ,
        "/Type /Pages" ,
        "/MediaBox [ 0 0 612 792 ]" ,
        [ "/Count %d" sprintf , ]
        [
            5 swap 2 range boa
            [ "%d 0 R " sprintf ] map concat
            "/Kids [ " "]" surround ,
        ] bi
        ">>" ,
    ] { } make "\n" join ;

The resources for each page n includes a reference to the contents object (n + 1) and the font object:

: pdf-page ( n -- page )
    [
        "<<" ,
        "/Type /Page" ,
        "/Parent 4 0 R" ,
        1 + "/Contents %d 0 R" sprintf ,
        "/Resources << /Font << /F1 3 0 R >> >>" ,
        ">>" ,
    ] { } make "\n" join ;

Pages are essentially objects which contain a stream of text operations. The stream is prefixed with the length of its contents:

: pdf-stream ( str -- str' )
    [ length 1 + "<<\n/Length %d\n>>" sprintf ]
    [ "\nstream\n" "\nendstream" surround ] bi append ;

The text operations that we will use to draw lines of text on each page:

  • BT - begin text
  • Td - location (where 0,0 is the bottom left of the page)
  • Tf - font and size
  • TL - line height
  • ' - insert newline and draw a line text
  • ET - end text
: pdf-text ( lines -- str )
    [
        "BT" ,
        "54 738 Td" ,
        "/F1 10 Tf" ,
        "12 TL" ,
        [ pdf-string "'" append , ] each
        "ET" ,
    ] { } make "\n" join pdf-stream ;

Using 10-point Courier font (6 points wide by 10 points tall), 12-points of line spacing, and 3/4 inch left and right margins: each page supports 57 lines of text (where each line is 84 characters long). We use splitting and grouping words to convert a string into pages of text:

: string>lines ( str -- lines )
    "\t" split "    " join string-lines
    [ [ " " ] when-empty ] map ;

: lines>pages ( lines -- pages )
    [ 84 <groups> ] map concat 57 <groups> ;

We can then take these "pages" and assemble PDF objects (including the info, catalog, font, and page index objects):

: pages>objects ( pages -- objects )
    [
        pdf-info ,
        pdf-catalog ,
        pdf-font ,
        dup length pdf-pages ,
        dup length 5 swap 2 range boa zip
        [ pdf-page , pdf-text , ] assoc-each
    ] { } make
    dup length [1,b] zip [ first2 pdf-object ] map ;

Given a list of objects, and a 9-byte %PDF-1.4 version header in the file, we can write a PDF complete with cross-reference table trailer and %%EOF "end-of-file" marker:

: pdf-trailer ( objects -- str )
    [
        "xref" ,
        dup length 1 + "0 %d" sprintf ,
        "0000000000 65535 f" ,
        9 over [
            over "%010X 00000 n" sprintf , length 1 + +
        ] each drop
        "trailer" ,
        "<<" ,
        dup length 1 + "/Size %d" sprintf ,
        "/Info 1 0 R" ,
        "/Root 2 0 R" ,
        ">>" ,
        "startxref" ,
        [ length 1 + ] map-sum 9 + "%d" sprintf ,
        "%%EOF" ,
    ] { } make "\n" join ;

: objects>pdf ( objects -- str )
    [ "\n" join "\n" append "%PDF-1.4\n" ]
    [ pdf-trailer ] bi surround ;

Putting it all together, we can convert a text string to PDF, and a text file to PDF:

: text-to-pdf ( str -- str' )
    string>lines lines>pages pages>objects objects>pdf ;

: file-to-pdf ( path encoding -- )
    [ file-contents text-to-pdf ]
    [ [ ".pdf" append ] dip set-file-contents ] 2bi ;

Running this code on itself, produces this PDF file (so you can see what it will look like).

As usual, the code is available on my Github.

Wednesday, October 13, 2010

Is Rotation?

I noticed an "interview question" that was posted on StackOverflow awhile ago. It's not particularly complicated -- basically asking "given two strings, how to tell if one is the rotated version of the other?"

Some discussion in the question deals with various faster methods, but the simplest answer is a Python version:

def isrotation(s1, s2):
    return len(s1) == len(s2) and s1 in 2*s2

If we wanted to implement this in Factor, we might want to consider using "short circuit" combinators (which will apply a series of boolean tests and stop on the first test that fails). We will also use the convention that a word? (ending in a "?") returns a boolean.

: rotation? ( s1 s2 -- ? )
    { [ [ length ] bi@ = ] [ dup append subseq? ] } 2&& ;

We can test it, to make sure it works:

( scratchpad ) "stack" "tacks" rotation? .
t

( scratchpad ) "foo" "bar" rotation? .
f

Since strings are sequences of characters and this solution uses sequence operations (length, append, and subseq?), it is already generalized to operate on other types of sequences. For example, arrays:

( scratchpad ) { 1 2 3 } { 2 3 1 } rotation? .
t

So, the next time you get this question in an interview, maybe you can solve it in Factor!

Monday, September 27, 2010

Fortune-telling

One program that some new users of Unix-like systems are introduced to when learning the command-line is fortune. The purpose of the program is simple: to display a random message from a database of quotations.

Note: fortune is available on the Mac OS X using MacPorts (port install fortune), Ubuntu (apt-get install fortune-mod), and Fedora (yum install fortune-mod).

A demonstration of running it:

$ fortune
The human race has one really effective weapon, and that is laughter.
  -- Mark Twain

I thought it would be fun to get this to work in Factor. We're going to do it two different ways. The first way is to use the io.launcher vocabulary for launching processes and printing the output directly to the listener.

( scratchpad ) "/opt/local/bin/fortune" utf8 
               [ contents print ] with-process-reader
It'll be just like Beggars Canyon back home.
  -- Luke Skywalker

Another way is to parse the fortune files directly, and then choose a random quotation to print (i.e., implementing the fortune logic inside of Factor). First, we will need to list the vocabularies we will be using and define a "fortune" namespace:

USING: io io.encodings.ascii io.files kernel make memoize
random sequences splitting strings ;

IN: fortune

The fortune program uses data files which are stored in a "cookie-jar format". From faqs.org:

It is appropriate for records that are just bags of unstructured text. It simply uses newline followed by %% (or sometimes newline followed by %) as a record separator.

Parsing a fortune data file (given a string containing the contents in cookie-jar format) is pretty straightforward. We are using slices which allow us to keep pointers to sections of text within the data file.

: parse-fortune ( str -- seq )
    [
        [ "%\n" split1-slice dup ]
        [ swap , ] while drop ,
    ] { } make ;

: load-fortunes ( path -- seq )
    ascii file-contents parse-fortune ;

The fortune data files are installed in different locations on different systems, so we can define a search path which includes some of the locations that are often used. We can then use this list to load the fortunes from the first data file that exists (and memoize the results for performance):

CONSTANT: FORTUNES {
    "/usr/games/fortune/fortunes"
    "/usr/share/games/fortune/fortunes"
    "/usr/local/share/games/fortune/fortunes"
    "/opt/local/share/games/fortune/fortunes"
}

MEMO: default-fortunes ( -- seq )
    FORTUNES [ exists? ] find nip load-fortunes ;

Getting a fortune is then just a matter of choosing a random quotation and printing it out:

: fortune ( -- )
    default-fortunes random >string print ;

If you want, you could define a MAIN: entry point and use the deploy tool to make this into a deployed binary.

The source code for this is available on my Github.

Update: Doug Coleman suggested a one-liner that can produce the same result:

"/opt/local/share/games/fortune/fortunes" ascii file-lines
{ "%" } split random [ print ] each