Wednesday, December 19, 2012

Today In History

For every day of the year, Wikipedia maintains a list of famous events, births, and deaths in history. For example, if you look at today in history, you will see that in 1843, Charles Dickens novel, A Christmas Carol first went on sale.

Wouldn't it be great if you could use Factor to parse and display these famous historical events?

USING: accessors calendar formatting http.client io kernel
sequences xml xml.traversal ;

Using our calendar vocabulary, we will build a word that converts a timestamp into a Wikipedia URL:

: historical-url ( timestamp -- url )
    [ month-name ] [ day>> ] bi
    "http://en.wikipedia.org/wiki/%s_%s" sprintf ;

If you look at the source for these Wikipedia pages, you will find that it contains a series of unordered lists. In particular, we are interested in the second list containing "Events". The third list contains "Births" and the fourth contains "Deaths".

: historical-events ( timestamp -- events )
    historical-url http-get nip string>xml
    "ul" deep-tags-named second children-tags
    [ deep-children>string ] map ;

Sometimes it is a nice convention to build a word that produces output:

: historical-events. ( timestamp -- )
    historical-events [ print ] each ;

You can try this out:

IN: scratchpad today historical-events.
211 – Publius Septimius Geta, co-emperor of Rome, is lured
to come without his bodyguards to meet his brother Marcus
Aurelius Antoninus (Caracalla), to discuss a possible
reconciliation. When he arrives the Praetorian Guard murders
him and he dies in the arms of his mother Julia Domna.
324 – Licinius abdicates his position as Roman Emperor.
1154 – Henry II of England is crowned at Westminster Abbey.
1490 – Anne, Duchess of Brittany, is married to Maximilian I,
Holy Roman Emperor by proxy.
...

The code for this is on my GitHub, along with words to output historical births and deaths and formatted output that includes links to Wikipedia.

Friday, November 30, 2012

New Logo

For a long time now, the Factor logo has been a cool looking velociraptor designed in 2007 by Elie Chaftari. As part of supporting retina displays, we need new higher resolution icons. Unfortunately, Elie doesn't have the original artwork anymore to produce them.

With the help of 99designs, we ran a contest to create a new logo and icon for the Factor programming language (to capture the increase in speed and functionality gained in the last few years).

This is our new logo:

And incorporating the logo into a new application icon:

The website has been updated and the latest development version includes new icon files.

Thursday, November 15, 2012

Retina Displays

I recently purchased a MacBook Pro with Retina Display. Shortly after receiving it, I realized that Factor did not support rendering on its Retina Display. Magnified (and blurry) text looks pretty terrible next to the crisp native fonts on these HiDPI displays.

Well, after a series of commits, the latest development version of Factor now supports it! There are still a few minor rendering issues with our newly supported "resolution independence", but we plan on fixing those in the near future.

To use the development version, follow the directions from the README.

Thursday, October 25, 2012

The 3n+1 Problem

Yesterday, a lengthy blog post from the Racket community discussed the "3n+1" problem (also known as the Collatz conjecture):

Consider a positive number n. The cycle length of n is the number of times we repeat the following, until we reach n=1:

  • If n is odd, then n = 3n+1
  • If n is even, then n = n/2

For example, given n=22, we see the following sequence: 22 11 34 17 52 26 13 40 20 10 5 16 8 4 2 1. The cycle length of 22 is, therefore, 16, since it took 16 repetitions to get to 1.

Given a definition of cycle length of a number, here’s the rest of the problem: given any two numbers i and j, compute the maximum cycle length over all numbers between i and j, inclusive.

Solution

Solving this in Factor could start by defining a word to take a number and compute its next value in the cycle. Since this is an internal word, we won't add the check for the end condition of n=1:

: next-cycle ( x -- y )
    dup odd? [ 3 * 1 + ] [ 2 / ] if ;

Creating the "cycle sequence" for a given number can be done by:

: cycles ( n -- seq )
    [ dup 1 > ] [ [ next-cycle ] keep ] produce swap suffix ;

We could always find the "cycle length" by calling cycles length, but a slightly faster way would not create an intermediate sequence:

: cycle-length ( n -- m )
    1 [ over 1 > ] [ [ next-cycle ] [ 1 + ] bi* ] while nip ;

We can find the maximum cycle (number and it's cycle length) for any given sequence of numbers:

: max-cycle ( seq -- elt )
    [ dup cycle-length ] { } map>assoc [ second ] supremum-by ;

For convenience, we can make some helper words to find either the number and/or the maximum cycle length:

: max-cycle-value ( seq -- n ) max-cycle first ;

: max-cycle-length ( seq -- m ) max-cycle second ;

Testing

Factor strives to be a concise language, some of which is natural for a concatenative style. You can see this in the simple approach to unit testing:.

{
    { 22 11 34 17 52 26 13 40 20 10 5 16 8 4 2 1 }
} [ 22 cycles ] unit-test

{ 1  } [ 1 cycle-length ] unit-test
{ 2  } [ 2 cycle-length ] unit-test
{ 6  } [ 5 cycle-length ] unit-test
{ 16 } [ 22 cycle-length ] unit-test

{ 20  } [ 1 10 [a,b] max-cycle-length ] unit-test
{ 125 } [ 100 200 [a,b] max-cycle-length ] unit-test
{ 89  } [ 201 210 [a,b] max-cycle-length ] unit-test
{ 174 } [ 900 1000 [a,b] max-cycle-length ] unit-test

Performance

The original post discussed performance. Since many cycles have numbers in common, we could change our approach to a recursive one, with memoization:

MEMO: fast-cycle-length ( n -- m )
    dup 1 > [ next-cycle fast-cycle-length 1 + ] [ drop 1 ] if ;

Indeed, this takes roughly 15-20% of the time that our previous approach took to compute the number of cycles for the numbers 1 to 100,000 with a cold cache.

Main

Like the original solution, we can add a "main" to allow running from the command line:

: run-cycles ( -- )
    [
        [ blank? ] split-when harvest first2
        [ string>number ] bi@ 2dup [a,b] max-cycle-length
        "%s %s %s\n" printf
    ] each-line ;

MAIN: run-cycles

And using the original sample file, produce the same output:

$ cat sample-data.txt 
1 10
100 200
201 210
900 1000

$ factor cycles.factor < sample-data.txt
1 10 20
100 200 125
201 210 89
900 1000 174

The code for this is available on my Github.

Tuesday, October 23, 2012

Select Random Lines

Ned Batchelder made an interesting blog post on selecting randomly from an unknown sequence. His Python version uses a "random replacement" strategy to choose each line with a 1/n probability, where n is the number of elements seen so far:

import random

def random_element(seq):
    """Return an element chosen at random from `seq`."""
    it = None
    for n, elem in enumerate(seq):
        if random.randint(0, n) == 0:
            it = elem
    return it

random-line

I wanted to show how a similar approach in Factor could be used to select a random line from a file, or any stream that supports a character-based stream protocol.

We can make a variant of each-line that calls a quotation with each line and its "line number" (indexed from 1 to n, the number of lines in the stream):

: each-numbered-line ( ... quot: ( ... line number -- ... ) -- ... )
    [ 1 ] dip '[ swap [ @ ] [ 1 + ] bi ] each-line drop ; inline

Then, it is pretty easy to select a random line (replacing each line with chance of 1/K probability where K is the current line number.

: random-line ( -- line/f )
    f [ random zero? [ nip ] [ drop ] if ] each-numbered-line ;

random-lines

To efficiently select more than one random line from a stream, we will be using a technique called reservoir sampling. The idea is to select the first n elements, then randomly replace them with weighted probability n/K, where K is the current line number:

:: random-lines ( n -- lines )
    V{ } clone :> accum
    [| line line# |
        line# n <= [
            line accum push
        ] [
            line# random :> r
            r n < [ line r accum set-nth ] when
        ] if
    ] each-numbered-line accum ;
Note: we used local variables to implement this.

This is available in the io.random vocabulary in the Factor development branch.

Thursday, October 18, 2012

Parsing IPv4 Addresses

I came across an interesting question on StackOverflow asking about parsing IPv4 addresses. Typically, IPv4 addresses are specified with four components (e.g., something like 192.164.1.55). As the top answer points out, you might be suprised to see that ping interprets addresses a bit oddly:

C:\>ping 1
Pinging 0.0.0.1 with 32 bytes of data:

C:\>ping 1.2
Pinging 1.0.0.2 with 32 bytes of data:

C:\>ping 1.2.3
Pinging 1.2.0.3 with 32 bytes of data:

C:\>ping 1.2.3.4
Pinging 1.2.3.4 with 32 bytes of data:

C:\>ping 1.2.3.4.5
Ping request could not find host 1.2.3.4.5. Please check the name and try again.

C:\>ping 255
Pinging 0.0.0.255 with 32 bytes of data:

C:\>ping 256
Pinging 0.0.1.0 with 32 bytes of data:

In fact, you can reach google.com, using the IP address specified as dotted decimal (74.125.226.4), flat decimal (1249763844), dotted octal (0112.0175.0342.0004), flat octal (011237361004), dotted hex (0x4A.0x7D.0xE2.0x04), flat hex (0x4A7DE204), or even something of each (74.0175.0xe2.4).

Implementation

Of course, my first thought was that Factor should have a parser that works similarly (especially since I implemented support for the ping protocol awhile ago). We want a parse-ipv4 word taking a string representing the address and returning an IPv4 address string that has the typical four components.

First, we need to have words to split a string into numbered parts and a word to join them back together:

: split-components ( str -- array )
    "." split [ string>number ] map ;

: join-components ( array -- str )
    [ number>string ] map "." join ;

Then, we can parse the address simply:

: parse-ipv4 ( str -- ip )
    split-components dup length {
        { 1 [ { 0 0 0 } prepend ] }
        { 2 [ first2 [| A D | { A 0 0 D } ] call ] }
        { 3 [ first3 [| A B D | { A B 0 D } ] call ] }
        { 4 [ ] }
    } case join-components ;

Extras

If we want to support octal addresses, we can convert an octal number like 0112 to something Factor can easily parse (0o112) in our splitting code:

: cleanup-octal ( str -- str )
    dup { [ "0" head? ] [ "0x" head? not ] } 1&&
    [ 1 tail "0o" prepend ] when ;

: split-components ( str -- array )
    "." split [ cleanup-octal string>number ] map ;

And if we want to support the "carry propagation" which allows 256 to mean 0.0.1.0, we need to "bubble" the array before joining:

: bubble ( array -- array' )
    reverse 0 swap [ + 256 /mod ] map reverse nip ;

: join-components ( array -- str )
    bubble [ number>string ] map "." join ;

This (along with some error handling) has been committed to the Factor repository in the ip-parser vocabulary. If it proves useful, it might be nice to change the io.sockets to use this when resolving IPv4 addresses...

Monday, September 24, 2012

COLOR: <TAB>

As a diversion today, I added "color tab completion" to the Factor environment. I even made the color names render with their assigned color value:

For the curious, I extended the tools.completion vocabulary to implement a "colors-matching" word and then added a "colors-completion" type to the UI.

Tuesday, September 18, 2012

Faster Tables!

We've known for awhile that the Factor user interface is a little bit slow when displaying tables with many rows. I wasn't sure this was a big problem until a detailed bug report was filed (thanks @k7f!) that pointed out the opengl.gl vocabulary (with over 2,000 symbols in it) was "virtually unscrollable".

After poking around at the profiler output, I noticed that some UI gadgets had "baseline alignment" support which was used in the layout process. Some of these calculations were quite expensive and could easily be cached.

After a few changes, and introducing the concept of an "aligned gadget" which provides a mechanism to cache these alignment calculations, the original test case is almost 10 times as responsive as before!

This is available in the master branch on Github.

Monday, September 10, 2012

Watching Words

Function "annotations" are a feature that many languages support. They may take various forms such as decorators in Python or class and method annotations in Java -- having direct (like Python) or no direct (like Java) effect on the code it annotates. Factor has some similar features that I'll demonstrate below.

We can define a simple word to use in this example (that adds 2 to its input):

: foo ( x -- n ) 2 + ;

Watch

Using the tools.annotations vocabulary, we can attach a "watch" annotation that prints the inputs and outputs of a word when it is called:

IN: scratchpad \ foo watch

If you print the word definition, you can see how it was modified:

IN: scratchpad \ foo see
USING: math tools.annotations.private ;
IN: scratchpad
: foo ( x -- n ) \ foo entering 2 + \ foo leaving ;

You can call this word (either directly, or indirectly by calling another word which calls it) and see its inputs and outputs. A nice feature of this is that the UI lets you click on these values and see more detail (particularly useful if they are tuples or more complex objects):

IN: scratchpad 3 foo
--- Entering foo
x 3
--- Leaving foo
n 5

Reset

You can stop watching a word by calling "reset" on it (or right-clicking on its definition in the UI and choosing "reset") to remove all of its annotations. Currently, a word can only be annotated once.

IN: scratchpad \ foo reset

Timing

In addition to "watching", you can also use annotations to track the total running time of a word:

IN: scratchpad \ foo add-timing

IN: scratchpad 0 10,000 [ foo ] times .
20000

IN: scratchpad word-timing.
foo 0.000594456

Other

It also supports arbitrary annotations, such as adding "before" and "after" logic:

IN: scratchpad \ foo [ "hi" print ] [ "bye" print ]
               [ surround ] 2curry annotate

IN: scratchpad 3 foo .
hi
bye
5

These annotations can be pretty powerful and were even used to build our code coverage tool.

Wednesday, August 29, 2012

Literate Programming

Donald Knuth pioneered Literate programming as a technique for writing structured programs. The literate programming world typically has more descriptive text than code, so rather than "comment out" the descriptive text, they "comment in" the code.

It is very popular in some communities, for example Haskell, where even many blog posts are written in a Literate Haskell style. This is done by creating a .lhs file instead of a .hs file. In it, all lines starting with > are interpreted as code, everything else is considered a comment.

Graham Telfer asked a question on the mailing list if Factor supported literate programming. I thought it might be fun to implement some of the ideas quickly, below I'll share how it works.

The Lexer

Factor uses a lexer object to turn a stream of text into tokens that are used by the parser to turn tokens into Factor objects and definitions. We can extend the lexer system to create a version that skips over any lines that are not prefixed by a > character.

First, we define our literate-lexer sub-class:

TUPLE: literate-lexer < lexer ;

: <literate-lexer> ( text -- lexer ) literate-lexer new-lexer ;

Second, we implement the skip-blank word to skip over all lines that are just comments:

M: literate-lexer skip-blank
    dup column>> zero? [
        dup line-text>> [
            "> " head?
            [ [ 2 + ] change-column call-next-method ]
            [ [ nip length ] change-lexer-column ]
            if
        ] [ drop ] if*
    ] [ call-next-method ] if ;

We can then create a quick syntax word that looks for an end token and parses all the lines between:

SYNTAX: <LITERATE
    "LITERATE>" parse-multiline-string string-lines [
        <literate-lexer> (parse-lines) append!
    ] with-nested-compilation-unit ;

Try It

Now, you can do something like this:

IN: scratchpad <LITERATE
               This is a big wall of text with no Factor code...
               Does this work?
               1 1 + .

               I bet it didn't... maybe the following works:
               > 8675309 .
               Yay!

               You can create functions that span multiple lines
               > : foo ( -- x )
               We interrupt this program to bring you this:
               > 12 ;

               Now, we can run foo:
               > foo .
               LITERATE>
8675309
12

It might be nice to automatically support .lfactor files, but this is a quick prototype to see if it makes sense. Not bad for ten or so lines of code?

It is available now in the development branch of Factor, in the literate vocabulary.

Thursday, August 16, 2012

Factor 0.95 now available

"The reports of my death are greatly exaggerated" - Mark Twain

I'm very pleased to announce the release of Factor 0.95!

OS/CPUWindowsMac OS XLinux
x86
x86-64

Source code: 0.95

Note: it looks like the Mac OS X binaries unintentionally require 10.8 or greater, due to an upgrade of our build farm. If you want to use it on 10.6 or 10.7 before we make a fix, you can build from source:

git clone https://github.com/slavapestov/factor.git && cd factor && ./build-support/factor.sh update

This release is brought to you with over 2,500 commits by the following individuals:

Alex Vondrak, Alfredo Beaumont, Andrew Pennebaker, Anton Gorenko, Brennan Cheung, Chris Double, Daniel Ehrenberg, Doug Coleman, Eric Charlebois, Eungju Park, Hugo Schmitt, Joe Groff, John Benediktsson, Jon Harper, Keita Haga, Maximilian Lupke, Philip Searle, Philipp Br├╝schweiler, Rupert Swarbrick, Sascha Matzke, Slava Pestov, @8byte-jose, @yac, @otoburb, @rien

In addition to lots (and lots!) of bug fixes and improvements, I want to highlight the following features:

  • GTK-based UI backend
  • Native image loaders using Cocoa, GTK, and GDI+
  • Sampling profiler replacing counting profiler
  • Code coverage tool to improve unit tests
  • VM and application-level signal handlers
  • ICMP support and an IPv4 "ping" implementation
  • DNS client and "host" implementation
  • Support frexp and log of really big numbers
  • Cross-platform open URL in webbrowser
  • Cross-platform send file to trash (Mac OS X, Linux, Windows)
  • Speedup bignum GCD and ratio operations
  • Speedup in thread yield performance on Mac OS X
  • CSV library is 3x faster
  • XML library is 2x faster
  • JSON library is 2-3x faster
  • Many stability and performance enhancements

Some possible backwards compatibility issues:

  • Change alien references from "<int>" to "int <ref>" and "*int" to "int deref"
  • Removed Windows CE, BSD, and Solaris platform support
  • Natively support binary (0b), octal (0o), and hexadecimal (0x) number syntax
  • Unify "( -- )" and "(( -- ))" stack effect syntax
  • Change prepend to return type of first sequence to match append behavior
  • Change ".factor-rc" to be ".factor-rc" on all platforms
  • Cleanup specialized array syntax to be more generic and consistent
  • Change to quadratic probing instead of linear probing in hashtables
  • Allow dispatching on anonymous intersections/unions

What is Factor

Factor is a concatenative, stack-based programming language with high-level features including dynamic types, extensible syntax, macros, and garbage collection. On a practical side, Factor has a full-featured library, supports many different platforms, and has been extensively documented.

The implementation is fully compiled for performance, while still supporting interactive development. Factor applications are portable between all common platforms. Factor can deploy stand-alone applications on all platforms. Full source code for the Factor project is available under a BSD license.


New Libraries


Improved Libraries:

Monday, August 6, 2012

Echo Server

Factor has nice cross-platform support for network programming. As is fairly typical, I am going to use an "echo server" to demonstrate how the libraries work.

The basic logic for an "echo server" is to read input from a client, write it back to them, and repeat until the client disconnects. Using the input and output stream abstraction, we can build a word which does this in a general manner, recursing until f is read indicating end-of-stream:

: echo-loop ( -- )
    1024 read-partial [ write flush echo-loop ] when* ;

Our "echo server" will use TCP (i.e., connection-oriented networking) and the general server abstraction that comes with Factor with a binary encoding. The server framework uses the name "echo.server" to automatically log client-related messages (such as connect and disconnect events as well as errors) to $FACTOR/logs/echo.server. We specify the echo-loop quotation as the handler for clients:

: <echo-server> ( port -- server )
    binary <threaded-server>
        swap >>insecure
        "echo.server" >>name
        [ echo-loop ] >>handler ;

We can start an echo server on, for example, port 12345:

IN: scratchpad 12345 <echo-server> start-server

Testing this is pretty easy using Netcat or a similar client:

$ nc localhost 12345
Hello, Factor
Hello, Factor

This is available as the echo-server vocabulary.

Monday, July 23, 2012

xterm-256color

For those of you using terminals, you are probably aware of ANSI escape codes. As a refresher, it allows formatting text with foreground and background colors, underlines, italics, blinking, among other characteristics.

Factor has support for formatted output streams that can apply styles to textual content. It is used extensively by the help system to produce documentation in both the UI and command-line, export to HTML or PDF, and even syntax highlighting.

This morning, I wanted to implement a formatted output stream that can be used in the terminal to produce colored text output easily.

You can look at documentation for the xterm-256color ANSI extensions, but basically you wrap your text with format codes, like so:

Setting foreground to blue:

Setting background to red:

There are varying methods of checking if a terminal supports the 256 color extension, but a simple way is to look at the TERM environment variable:

: 256color-terminal? ( -- ? )
    "TERM" os-env "-256color" tail? ;

The specification should tell us how to map RGB color codes to their "256color" code:

RGB Colors

A value between 16 and 231 is used for RGB colors with each color having 6 intensities. Red has a value between 0 and 5 multiplied by 36, Green has a value between 0 and 5 multiplied by 6, and Blue has a value between 0 and 5.

8 to 24 bit color conversion

To display xterm RGB colors as 24 bit RGB colors the following values are suggested for the 6 intensities: 0x00, 0x5F, 0x87, 0xAF, 0xD7, and 0xFF.

Following that guideline, we can build our map of "256color" codes. I also added system colors and grayscale colors, which is left as an exercise for the reader (or you can read the source code):

CONSTANT: intensities { 0x00 0x5F 0x87 0xAF 0xD7 0xFF }

CONSTANT: 256colors H{ }

! Add the RGB colors
intensities [| r i |
    intensities [| g j |
        intensities [| b k |
            i 36 * j 6 * + k + 16 +
            r g b 3array
            256colors set-at
        ] each-index
    ] each-index
] each-index

Given a RGBA Color, we want to convert it to an array of its red, green, and blue components, and then find the "256color" with the smallest "distance" (in this case Euclidean distance) from the desired value.

: color>rgb ( color -- rgb )
    [ red>> ] [ green>> ] [ blue>> ] tri
    [ 255 * round >integer ] tri@ 3array ;

: color>256color ( color -- 256color )
    color>rgb '[ _ distance ]
    256colors [ keys swap infimum-by ] [ at ] bi ;

Converting a color to either a foreground or a background ANSI escape code is now pretty easy:

: color>foreground ( color -- str )
    color>256color "\u00001b[38;5;%sm" sprintf ;

: color>background ( color -- str )
    color>256color "\u00001b[48;5;%sm" sprintf ;

With a basic implementation of the formatted stream protocol, we can now take something like this:

And easily (and automatically!) display it as something like this:

To see more colorful implementation of words:

Or even experiment with various background colors:

This has been committed to Factor's development branch and is available now.

Tuesday, May 29, 2012

Wake-on-LAN

Wake-on-LAN is an ethernet standard that allows a computer to be turned on or woken up by a network message. This is a kinda-fun-sometimes-useful feature that I thought we should have in Factor.

The way the protocol works is to broadcast a "magic packet" that contains information describing the computer you want to wakeup. From the Wikipedia article:

The magic packet is a broadcast frame containing anywhere within its payload 6 bytes of all 255 (FF FF FF FF FF FF in hexadecimal), followed by sixteen repetitions of the target computer's 48-bit MAC address, for a total of 102 bytes. Since the magic packet is only scanned for the string above, and not actually parsed by a full protocol stack, it may be sent as any network- and transport-layer protocol, although it is typically sent as a UDP datagram to port 7 or 9, or directly over Ethernet as EtherType 0x0842.

First, we need a way to turn a MAC address into an array of bytes:

: mac-address-bytes ( mac-address -- byte-array )
    ":-" split [ hex> ] B{ } map-as ;

Next, we need a way to construct the "magic packet" according to the specification:

: wake-on-lan-packet ( mac-address -- bytearray )
    [ 16 ] [ mac-address-bytes ] bi* <array> concat
    B{ 0xff 0xff 0xff 0xff 0xff 0xff } prepend ;

Now that we have the "magic packet", we can create a datagram port configured to send broadcast packets (using a recent commit that made this easy and cross-platform):

: wake-on-lan ( mac-address broadcast-ip -- )
    [ wake-on-lan-packet ] [ 9 <inet4> ] bi*
    f 0 <inet4> <broadcast> [ send ] with-disposal ;

It's easy enough to test this using Mac OS X since most modern Mac hardware supports Wake-on-LAN. You can configure it via the System Preferences Energy Saver panel, in the Options tab. Marking the "Wake for network access" checkbox enables Wake-on-LAN support. For other platforms, read more about receiving the magic packet.

The code for this is on my Github.