Wednesday, March 30, 2011

ASCII Art

Yesterday, Raymond Hettinger, an active contributor to Python, tweeted a fun one-liner that works in Python 3.2. I thought I would show how to translate it into Factor.

Fun #python3.2 one-liner: print('\n'.join('*'*(c//2000) for i,c in sorted(Counter(map(sum, product(range(6), repeat=8))).items())))less than a minute ago via web


The original Python "one-liner", including required imports looks like this:

from collections import Counter
from itertools import product

print('\n'.join('*'*(c//2000) for i,c in
    sorted(Counter(map(sum, product(range(6), repeat=8))).items())))

To translate this code into a concatenative language, we are going to work from the inside and move out. In this case, starting with the sum of the cartesian product of eight sequences of the numbers 0 through 5:

map(sum, product(range(6), repeat=8))
vs.
8 [ 6 iota ] replicate [ sum ] product-map

Next, we see that it counts each element, and produces a sorted list of items:

sorted(Counter(...).items())
vs.
histogram >alist sort-keys

And finally, prints a line of stars for each element:

print "\n".join('*'*(c//2000) for i,c in ...)
vs.
values [ 2000 /i CHAR: * <string> print ] each

Putting it all together, it makes this nice "ASCII Art" visualization:

( scratchpad ) USING: assocs io math math.statistics sequences
               sequences.product sorting strings ;

( scratchpad ) 8 [ 6 iota ] replicate [ sum ] product-map
               histogram >alist sort-keys values
               [ 2000 /i CHAR: * <string> print ] each







*
***
*****
********
************
******************
*************************
********************************
*****************************************
*************************************************
********************************************************
**************************************************************
******************************************************************
*******************************************************************
******************************************************************
**************************************************************
********************************************************
*************************************************
*****************************************
********************************
*************************
******************
************
********
*****
***
*





Monday, March 28, 2011

Unique Hash

Recently, I stumbled onto a blog post from 2009 which discussed a way to generate random-looking strings from a series of unique numeric id's, using PHP. The author uses a base-62 encoding ([0-9][A-Z][a-z]) to convert a number to a unique string using a series of prime numbers to reduce the potential of collisions. Below, I show how it might look in Factor.

First, we implement a simple base-62 encoder:

CONSTANT: CHARS
"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"

: base62 ( n -- string )
    [ dup 0 > ] [ 62 /mod CHARS nth ] "" produce-as reverse nip ;

Next, we define a series of prime numbers (which should be kept "secret" to make the algorithm hard to predict) chosen to be close to the next highest prime from the golden mean of the number of possible permutations for each string length:

CONSTANT: PRIMES
{ 1 41 2377 147299 9132313 566201239 35104476161 2176477521929 }

Finally, we can implement the "unique hash" function:

:: udihash ( n chars -- string )
    chars PRIMES nth n * 62 chars ^ mod base62
    chars CHAR: 0 pad-head ;

Try it out and see that it produces the same random-looking results as the original author. For example, a 5-character hash of the numbers 1 through 10:

( scratchpad ) 10 [1,b] [ 5 udihash print ] each
cJio3
EdRc6
qxAQ9
TGtEC
5ac2F
huKqI
KE3eL
wXmSO
YrVGR
BBE4U

You can find a discussion on StackOverflow, a similar approach used in the SilverStripe CMS to shorten URLs, and lots of search results probably inspiring (or inspired by) the original blog post.

If you're curious how to select your prime numbers, you can use math.primes to create your own list:

( scratchpad ) CONSTANT: golden-ratio 1.618033988749894848

( scratchpad ) 5 [1,b] [
                   62 swap ^ golden-ratio /i next-prime
               ] map .
{ 41 2377 147299 9132313 566201239 }

The code for this is on my Github.

Friday, March 25, 2011

Sum

Today's programming challenge is to implement the "old Unix Sys V R4" sum command:

"The original sum calculated a checksum as the sum of the bytes in the file, modulo 216−1, as well as the number of 512-byte blocks the file occupied on disk. Called with no arguments, sum read standard input and wrote the checksum and file blocks to standard output; called with one or more filename arguments, sum read each file and wrote for each a line containing the checksum, file blocks, and filename."

First, some imports:

USING: command-line formatting io io.encodings.binary io.files
kernel math math.functions namespaces sequences ;

Short Version

A quick file-based version might look like this:

: sum-file. ( path -- )
    [
        binary file-contents
        [ sum 65535 mod ] [ length 512 / ceiling ] bi
    ] [ "%d %d %s\n" printf ] bi ;

You can try it out:

( scratchpad ) "/usr/share/dict/words" sum-file.
19278 4858 /usr/share/dict/words

The main drawbacks to this version are: loading the entire file into memory (which might be a problem for big files), not printing an error if the file is not found, and not supporting standard input.

Full Version

A more complete version might begin by implementing a function that reads from a stream, computing the checksum and the number of 512-byte blocks:

: sum-stream ( -- checksum blocks )
    0 0 [ 65536 read-partial dup ] [
        [ sum nip + ] [ length + nip ] 3bi
    ] while drop [ 65535 mod ] [ 512 / ceiling ] bi* ;

The output should look like CHECKSUM BLOCKS FILENAME:

: sum-stream. ( path -- )
    [ sum-stream ] dip "%d %d %s\n" printf ;

We can generate output for a particular file (printing FILENAME: not found if the file does not exist):

: sum-file. ( path -- )
    dup exists? [
        dup binary [ sum-stream. ] with-file-reader
    ] [ "%s: not found\n" printf ] if ;

And, to prepare a version of sum that we can deploy as a binary and run from the command line, we build a simple MAIN: word:

: run-sum ( -- )
    command-line get [ "" sum-stream. ] [
        [ sum-file. ] each
    ] if-empty ;

MAIN: run-sum

The code for this is on my Github.

Wednesday, March 23, 2011

Look and Say

There was a programming challenge a week ago that asked for solutions that create the Look and Say sequence. Below is an implementation in Factor.

The “Look and Say” sequence, Sloane number A005150, begins 1, 11, 21, 1211, 111221, 312211, 13112221, 1113213211, …. Each term is constructed from its predecessor by stating the frequency and number of each group of like digits. For instance, the term after 1211 is “one 1, one 2, and two 1s”, or 111221.

We can use the splitting.monotonic vocabulary to split a number, recombine it based on the length of intermediate sequences of digits, and produce the next value in a "Look and Say" sequence:

USING: formatting kernel math.parser sequences splitting.monotonic ;

: look-and-say ( n -- n' )
    number>string [ = ] monotonic-split [
        [ length ] [ first ] bi "%d%c" sprintf
    ] map concat string>number ;

You can try it out in the Listener to see the first 10 numbers in the sequence:

( scratchpad ) 1 10 [ dup . look-and-say ] times
1
11
21
1211
111221
312211
13112221
1113213211
31131211131221
13211311123113112211

Monday, March 21, 2011

Names of Gaddafi

A question was posted yesterday on StackOverflow asking how to create a regular expression to search for the many names of Libyan leader, Muammar al-Gaddafi. I thought it would be fun to explore this problem using Factor.

Someone helpfully posted an image that demonstrates some of the variations of his full name:


List of Names

One approach would be to list out all his possible names, and then check to see if a given string is in the list:

CONSTANT: names {
    "Gadaffi"
    "Gadafi"
    "Gadafy"
    "Gaddafi"
    "Gaddafy"
    "Gaddhafi"
    "Gadhafi"
    "Gathafi"
    "Ghadaffi"
    "Ghadafi"
    "Ghaddafi"
    "Ghaddafy"
    "Gheddafi"
    "Kadaffi"
    "Kadafi"
    "Kad'afi"
    "Kaddafi"
    "Kadhafi"
    "Kazzafi"
    "Khadaffy"
    "Khadafy"
    "Khaddafi"
    "Qadafi"
    "Qaddafi"
    "Qadhafi"
    "Qadhaafi"
    "Qadhdhafi"
    "Qadthafi"
    "Qathafi"
    "Quathafi"
    "Qudhafi"
}

: gaddafi? ( string -- ? )
    names member? ;

Regular Expressions

If we wanted to speed it up, we could build a regular expression from all of the names (using literals to build the regular expression at compile time):

USE: regexp

: gaddafi? ( string -- ? )
    $[ names "|" join <regexp> ] matches? ;
Note: the first time you call this method it needs to compile the regular expression and is a bit slow. Subsequent calls are much faster.

One problem with that, is that it doesn't take into account that sometimes he is called "Al Qaddafi" or "el-Gaddafi". We could create our own case-insensitive pattern that attempts to capture all the possible variations of his name:

CONSTANT: names-pattern
R/ ((al|el)[-\s]?)?(Kh?|Gh?|Qu?)[aeu](d['dt]?|t|zz|dhd)h?aa?ff?[iy]/i

: gaddafi? ( string -- ? )
    names-pattern matches? ;

An advantage of using regular expressions is that it is easy to take a piece of text and normalize all the mentions of his name:

: normalize-gaddafi ( string -- string' )
    names-pattern "Gaddafi" re-replace ;

Soundex

An interesting idea was proposed on the Hacker News discussion to use Soundex to match names. That might look something like this:

USE: soundex

: gaddafi? ( string -- ? )
    soundex { "G310" "K310" "Q310" "Q331" } member? ;

Some disadvantages of this is that it doesn't capture names with prefix "Al" or "El", and misses some names (e.g., "Kazzafi" has a soundex value of "K210", but that would produce a false match against a name like "KOSOFF").

This problem is made even harder if you want to include all the possible variations of his first name (e.g., Muammar, Moamar, Mo'ammar, Mu'amar, Moamma, etc.), and include text that is "FIRSTNAME LASTNAME" or "LASTNAME, FIRSTNAME".

Sunday, March 20, 2011

Typed Netstrings

A few hours ago, Zed Shaw tweeted about an experiment creating typed netstrings:

An experiment in tagging netstrings with the types of their data as an alternative to JSON: http://codepad.org/xct0E5acless than a minute ago via web


I thought that an implementation in Factor would contrast nicely with Zed's Python version. The basic idea is to support four kinds of data types:

  • Text
  • Numbers
  • Lists
  • Dictionaries (e.g., maps or associations)

First, some imports:

USING: arrays combinators formatting hashtables kernel
math.parser sequences splitting ;

An encoded payload value looks something like "{LENGTH}:{PAYLOAD}{TYPE}". We can write a simple word to parse a string that looks like that into its parts:

: parse-payload ( data -- remain payload payload-type )
    ":" split1 swap string>number cut unclip swapd ;

We can build a simple parser for these "typed netstrings" (deferring for the moment, how we parse lists and dictionaries):

DEFER: parse-dict
DEFER: parse-list

: parse-tnetstring ( data -- remain value )
    parse-payload {
        { CHAR: # [ string>number ] }
        { CHAR: " [ ] }
        { CHAR: } [ parse-dict ] }
        { CHAR: ] [ parse-list ] }
        [ "Invalid payload type: %c" sprintf throw ]
    } case ;

Parsing lists is just repeatedly parsing values until the remainder is exhausted:

: parse-list ( data -- value )
    [ { } ] [
        [ dup empty? not ] [ parse-tnetstring ] produce nip
    ] if-empty ;

Parsing dictionaries is only a little more involved. We need a way to parse successive key/value pairs, checking some simple error conditions:

: parse-pair ( data -- extra value key )
    parse-tnetstring [
        dup [ "Unbalanced dictionary store" throw ] unless
        parse-tnetstring
        dup [ "Invalid value, null not allowed" throw ] unless
    ] dip ;

Then we can build the dictionary, repeatedly parsing key/value pairs:

: parse-dict ( data -- value )
    [ H{ } ] [
        [ dup empty? not ] [ parse-pair swap 2array ] produce
        nip >hashtable
    ] if-empty ;

And, to make the interface easy to use, we wrap our parse-tnetstring word, checking that there was no un-parsed remainder:

: tnetstring ( data -- value )
    parse-tnetstring swap [
        "Had trailing junk: %s" sprintf throw
    ] unless-empty ;

We can show that it works on one of Zed's more complex examples:

( scratchpad ) "34:5:hello\"22:11:12345678901#4:this\"]}" tnetstring .
H{ { "hello" { 12345678901 "this" } } }

The code (and some tests) for this is on Github.

Update: I added support for booleans and implemented writer words to match the reader words above. Everything is on my Github.

Friday, March 18, 2011

Square?

A recent code golf requested a way to determine if 4 points form a square. Factor may not be the shortest answer, but I thought I would contribute it anyway.

We require four unique points to be provided. We then use the distance formula to compute the distances between all pairs of points. For a square, there should be just two lengths (the side and the diagonal) not counting zero (the distance from a point to itself).

USING: kernel math math.combinatorics math.vectors sequences sets ;

: square? ( seq -- ? )
    members [ length 4 = ] [
        2 [ first2 v- [ sq ] map-sum ] map-combinations
        { 0 } diff length 2 =
    ] bi and ;

We can write some unit tests to make sure it works.

USE: tools.test

[ t ] [
    {
        { { 0 0 } { 0 1 } { 1 1 } { 1 0 } }   ! standard square
        { { 0 0 } { 2 1 } { 3 -1 } { 1 -2 } } ! non-axis-aligned square
        { { 0 0 } { 1 1 } { 0 1 } { 1 0 } }   ! different order
        { { 0 0 } { 0 4 } { 2 2 } { -2 2 } }  ! rotated square
    } [ square? ] all?
] unit-test

[ f ] [
    {
        { { 0 0 } { 0 2 } { 3 2 } { 3 0 } }   ! rectangle
        { { 0 0 } { 3 4 } { 8 4 } { 5 0 } }   ! rhombus
        { { 0 0 } { 0 0 } { 1 1 } { 0 0 } }   ! only 2 distinct points
        { { 0 0 } { 0 0 } { 1 0 } { 0 1 } }   ! only 3 distinct points
    } [ square? ] any?
] unit-test

Since it's code golf (fewest characters possible), how might you make it shorter?