Wednesday, April 20, 2011

iPhone Backups

A few weeks ago, I noticed a project on Github that extracts text messages from your iPhone and prints them out. More specifically, it accesses the backups that iTunes makes and queries a SQLite database to retrieve the messages. We are going to build something similar using Factor.

First, some vocabularies we will be using:

USING: accessors db db.sqlite db.tuples db.types io.directories
io.files.info io.files.unique io.pathnames kernel sequences
sorting ;

Database

We will need a word to choose the most recently modified file in a directory:

: last-modified ( path -- path' )
    [
        [ file-info modified>> ] sort-with last
    ] with-directory-files ;

The iPhone backups are stored (on Mac OS X) in the ~/Library/Application Support/MobileSync/Backup directory. We can choose the most recent backup:

: last-backup ( -- path )
    home "Library/Application Support/MobileSync/Backup"
    append-path dup last-modified append-path ;

The messages are stored in a SQLite database file with the name 3d0d7e5fb2ce288813306e4d4636395e047a3d28.

: last-messages ( -- path )
    last-backup "3d0d7e5fb2ce288813306e4d4636395e047a3d28"
    append-path ;

Since we will be using a backup file (that might be needed later), we will make make a helper word that copies the SQLite database to a temporary file (so we can't mess up the original) and then opens it:

: <copy-sqlite-db> ( path -- sqlite-db )
    temporary-file [ copy-file ] [ <sqlite-db> ] bi ;

Using this, we can make another helper word that allows us to run arbitrary queries on a copy of the most recent messages database:

: with-messages-db ( quot -- )
    [ last-messages <copy-sqlite-db> ] dip with-db ; inline

Schema

The SQLite database has a message table that contains all the text messages sent or received by your iPhone. Using .schema, we can see what it looks like:

CREATE TABLE message (
    ROWID INTEGER PRIMARY KEY AUTOINCREMENT, 
    address TEXT, 
    date INTEGER, 
    text TEXT, 
    flags INTEGER, 
    replace INTEGER, 
    svc_center TEXT, 
    group_id INTEGER, 
    association_id INTEGER, 
    height INTEGER, 
    UIFlags INTEGER, 
    version INTEGER, 
    subject TEXT, 
    country TEXT, 
    headers BLOB, 
    recipients BLOB, 
    read INTEGER
);

Queries

We could use raw SQL queries to retrieve all the messages:

( scratchpad ) [ "select * from message" sql-query ] with-messages-db

Or, we can build a word that performs a SQL query to retrieve just the number of messages, converting the response into a number.

: count-messages ( -- n )
    "select count(*) from message" sql-query
    first first string>number ;

Using it is pretty straightforward:

( scratchpad ) [ count-messages ] with-messages-db

Unfortunately, raw SQL queries are not able to automatically convert the columns from the result set based on the type of data stored. It would be nicer if numbers stored in the database could be retrieved as numbers from Factor. We can instead model the database tables, and then use high-level constructs to query and interact with the database. You can read a nice tutorial to get more information on how db.tuples works.

First, we create a tuple that matches the message table:

TUPLE: message rowid address date text flags replace svc-center
group-id association-id height ui-flags version subject country
headers recipients read? ;

Then, we define a persistence relationship between the message tuple and the message table:

message "message" {
    { "rowid" "ROWID" +db-assigned-id+ }
    { "address" "address" TEXT }
    { "date" "date" INTEGER }
    { "text" "text" TEXT }
    { "flags" "flags" INTEGER }
    { "replace" "replace" INTEGER }
    { "svc-center" "svc_center" TEXT }
    { "group-id" "group_id" INTEGER }
    { "association-id" "association_id" INTEGER }
    { "height" "height" INTEGER }
    { "ui-flags" "UIFlags" INTEGER }
    { "version" "version" INTEGER }
    { "subject" "subject" TEXT }
    { "country" "country" TEXT }
    { "headers" "headers" BLOB }
    { "recipients" "recipients" BLOB }
    { "read?" "read" INTEGER }
} define-persistent

Using this, we can count the number of messages very simply:

: count-messages ( -- n )
    T{ message } count-tuples ;

Or, retrieve all the messages:

: all-messages ( -- messages )
    T{ message } select-tuples ;

Or, retrieve all the messages we've sent:

: sent-messages ( -- messages )
    T{ message { flags 3 } } select-tuples ;

Or, retrieve only the unread messages we've received:

: unread-messages ( -- messages )
    T{ message { flags 2 } { read? 0 } } select-tuples ;

Or, retrieve the messages sent from a particular phone number:

: messages-from ( addr -- messages )
    message new swap >>address select-tuples ;

Or, using group-by, retrieve your messages as groups of conversations:

: all-conversations ( -- conversations )
    all-messages [ group-id>> ] group-by ;

Much more useful, and much cleaner, than using the raw SQL queries on the database. The results are returned as a sequence of message objects that can be manipulated directly within Factor.

Try It

You can use your new query tools, for example, to chart the number of messages you send or receive per day of the week:


The code for this is on my Github.

Sunday, April 17, 2011

Mail with GUI

One of the examples that is used to demonstrate Factor to new users is sending an e-mail. Using the smtp vocabulary, all you need to send an email is:

( scratchpad ) <email>
                   { "jane@foobar.com" } >>to
                   "Up for lunch?" >>subject
                   "At Tracy's." >>body
               send-email
Note: you might need to configure your SMTP settings before this works.

with-ui

That's nice, but wouldn't it be neat if we could use the UI to build a compose window that can send emails in a more graphical manner? Perhaps something like this:


We're going to use words from several vocabularies:

USING: accessors arrays colors.constants kernel smtp ui
ui.commands ui.gadgets ui.gadgets.borders ui.gadgets.buttons
ui.gadgets.editors ui.gadgets.labels ui.gadgets.scrollers
ui.gadgets.tracks ui.pens.solid ;

We define a type of track layout that holds editor gadgets for each of the fields we need to receive input for. We will set the "To:" field to have focus first when the window is displayed.

TUPLE: mail-gadget < track to subject body ;

M: mail-gadget focusable-child* to>> ;

We build the UI using words that layout each of the main features:

: <to> ( mail -- gadget )
    to>> "To:" label-on-left ;

: <subject> ( mail -- gadget )
    subject>> "Subject:" label-on-left ;

: <body> ( mail -- gadget )
    body>> <scroller> COLOR: gray <solid> >>boundary ;

Using the command framework, we can create commands for "Send" and "Cancel" and configure it so a toolbar could be created automatically.

: com-send ( mail -- )
    <email>
        over to>> editor-string 1array >>to
        over subject>> editor-string >>subject
        over body>> editor-string >>body
    send-email close-window ;

: com-cancel ( mail -- )
    close-window ;

mail-gadget "toolbar" f {
    { f com-send }
    { f com-cancel }
} define-command-map

Finally, we make a word that creates our mail gadget:

: <mail-gadget> ( -- gadget )
    vertical mail-gadget new-track
        1 >>fill
        { 10 10 } >>gap

        <editor> >>to
        <editor> >>subject
        <multiline-editor>
            10 >>min-rows
            60 >>min-cols
            >>body

        dup <to>      f track-add
        dup <subject> f track-add
        dup <body>    1 track-add
        dup <toolbar> f track-add ;

And a simple word to open the gadget in a new "compose" window (with a 5-pixel border for aesthetic reasons):

: open-compose-window ( -- )
    <mail-gadget>
        { 5 5 } <border> { 1 1 } >>fill
    "Compose" open-window ;

Bonus

You can even print the mail gadget out in the Listener to see how it looks. Note: it's fully functional, so be careful clicking those buttons!


Some things we could do to improve this simple example:

The code for this is on my Github.

Thursday, April 14, 2011

XKCD

Pretty much everyone loves Randall Munroe's XKCD comic. But, wouldn't it be better if you could read it from your Factor listener? I thought so too!

We are going to build something that lets you do this:


USING: formatting http.client images.http images.viewer kernel
regexp strings ;

We need a word that loads an XKCD comic webpage, extracts out of it the URL to the comic, and then loads it into Factor as an image object (to be displayed). In this case, we build a regular expression to look for the first URL that matches where he hosts his comic images.

: load-comic ( url -- image )
    http-get nip
    R" http://imgs\.xkcd\.com/comics/[^\.]+\.(png|jpg)"
    first-match >string load-http-image ;

Using this, we can display his latest comic strip:

: latest-comic. ( -- )
    "http://xkcd.com" load-comic image. ;

Or, knowing that each comic is numbered, a specific comic from the archive:

: comic. ( n -- )
    "http://xkcd.com/%s/" sprintf load-comic image. ;

Or, like the screenshot above, a random comic from the archives:

: random-comic. ( -- )
    "http://dynamic.xkcd.com/random/comic/" load-comic image. ;

The code for this is on my Github.

Tuesday, April 12, 2011

Wikipedia over DNS

A few days ago, I noticed a project that allowed querying Wikipedia over DNS. The way this works is to use the sub-domain query.wp.dg.cx and parse the DNS "text record", TXT, as defined in RFC 1035.

For example, to query for the term "computer":

$ host -t txt computer.wp.dg.cx
computer.wp.dg.cx descriptive text "A computer is a machine that mani
pulates data according to a list of instructions. The first devices t
hat resemble modern computers date to the mid-20th century (1940\226\
128\1471945), although the computer concept and various machines simi
lar to computers existed" " earlier. Early electronic computers were 
the size of a large room, consuming as much power as several hundred 
modern personal... http://a.vu/w:Computer"

Several months ago, Doug Coleman wrote the dns vocabulary to make DNS requests from Factor. Using it, you can lookup information for a hostname using a "pure Factor" version of host:

( scratchpad ) USE: tools.dns

( scratchpad ) "microsoft.com" host
microsoft.com has address 207.46.197.32
microsoft.com has address 207.46.232.182
microsoft.com mail is handled by 10 mail.messaging.microsoft.com

Yesterday, I spoke to Doug about using the dns vocabulary with the "Wikipedia over DNS" service. He discovered it required a few minor changes to support DNS TXT queries:

With a newer version of Factor that includes those changes, you can now make these queries:

( scratchpad ) USE: dns

( scratchpad ) "computer.wp.dg.cx" TXT.
A computer is a machine that manipulates data according to a list of 
instructions. The first devices that resemble modern computers date 
to the mid-20th century (1940–1945), although the computer concept a
nd various machines similar to computers existed earlier. Early elec
tronic computers were the size of a large room, consuming as much po
wer as several hundred modern personal... http://a.vu/w:Computer

Friday, April 8, 2011

Group By

When dealing with sequences, it can be useful to "group" them based on some criteria into smaller sequences. I couldn't find a word that quite solved my problem in Factor, so I wrote group-by:

USING: assocs kernel sequences ;

: group-by ( seq quot: ( elt -- key ) -- assoc )
    H{ } clone [
        [ push-at ] curry compose [ dup ] prepose each
    ] keep ; inline

Examples

For example, we could use it to split the first 20 numbers into two groups based on whether they are prime or not:

( scratchpad ) USE: math.primes

( scratchpad ) 20 iota [ prime? ] group-by .
H{
    { f V{ 0 1 4 6 8 9 10 12 14 15 16 18 } }
    { t V{ 2 3 5 7 11 13 17 19 } }
}

Or, we could group the subsets of a string by their length:

( scratchpad ) USE: math.combinatorics

( scratchpad ) "abc" all-subsets [ length ] group-by .
H{
    { 0 V{ "" } }
    { 1 V{ "a" "b" "c" } }
    { 2 V{ "ab" "ac" "bc" } }
    { 3 V{ "abc" } }
}

Or, group some random numbers by their bit-count:

( scratchpad ) USE: math.bitwise

( scratchpad ) 20 [ 100 random ] replicate
              [ bit-count ] group-by .
H{
    { 1 V{ 32 1 32 4 } }
    { 2 V{ 12 } }
    { 3 V{ 13 74 56 98 44 } }
    { 4 V{ 83 30 45 46 75 77 } }
    { 5 V{ 61 59 61 } }
    { 6 V{ 63 } }
}

How it works

The Factor code is roughly equivalent to the following Python code:

from collections import defaultdict

def group_by(seq, f):
    d = defaultdict(list)
    for value in seq:
        key = f(value)
        d[key].append(value)
    return d

Let's take it step-by-step. First, start defining a word (function) called group-by.

: group-by

Next, define it to take two arguments, a sequence (list or array) of values and a quotation (anonymous function or lambda) that computes a key for each element, and outputs an assoc (association, map, or dict). Names here are used only for documentation, it could take a foo and bar and return a baz.

( seq quot: ( elt -- key ) -- assoc )

The code inside the word is everything until the ";". We want to output a hashtable, so we first create one by cloning an empty hashtable (H{ }).

H{ } clone 

We will compose a word that duplicates each element to compute a key that is used to push each element into an appropriate bucket (a vector) in the hashtable. The push-at word has the signature ( value key assoc -- ). For example, if grouping by the length of a string, we want something that looks a bit like [ dup length H{ } push-at ]:

[ push-at ] curry compose [ dup ] prepose

We then, apply this quotation to each element in the sequence:

each

And, finally, we want to make sure that the hashtable that we created isn't "consumed", but kept on the stack as a return value.

[ ... ] keep ;

Tuesday, April 5, 2011

Verbosity

One of my favorite features in the Factor programming language is its general concise-ness. Certain problems can be expressed in a very modest amount of code. Mind you, we're not talking about the extremes that some languages go through to be good at Code Golf, but minimized to some extent.

A tutorial was posted a couple days ago that demonstrated how to write a program to generate a "variable length string full of random characters". I wanted to contrast the solution in the tutorial with a simple version written in Factor.

Note: this is not meant as a critique of the tutorial, nor a critique of the C# programming language.

The finished "random string" class (with tutorial comments removed):

using System;
using System.Text;

namespace RandomStringTutorial {

    public class RandomString {

        private char c;
        private int n;
        private Random r = new Random();
        private StringBuilder sb = new StringBuilder();

        public string GenerateString(int Length) {

            for (int i = 0; i < Length; i++) {
                sb.Append(RandomChar());
            }

            return sb.ToString();
        }

        private char RandomChar() {

            n = r.Next(65, 122);

            if (n > 65 && n < 90 || n > 97 && n < 123) {
                c = Convert.ToChar(n);
            } else {
                RandomChar();
            }

            return c;
        }
    }
}

And then a "main" method to run this as a program:

using System;

namespace RandomStringTutorial {

    class Program {

        static void Main() {

            RandomString rs = new RandomString();
            Console.Write(rs.GenerateString(8));
            Console.ReadLine();
        }
    }
}

Doing something similar in Factor, including a main word so that the vocabulary could be deployed as a binary, or run from the command-line:

USING: io kernel literals math.ranges random sequences ;

IN: random-string

CONSTANT: valid-chars $[
    CHAR: A CHAR: Z [a,b] CHAR: a CHAR: z [a,b] append
]

: random-string ( n -- string )
    [ valid-chars random ] "" replicate-as ;

: run-random-string ( -- )
    8 random-string print readln drop ;

MAIN: run-random-string

The code for this is on my Github.

Saturday, April 2, 2011

Powers of 2

I came across a blog post discussing an interview question for developers:

"Write a function to determine if a number is a power of 2."

Subsequently, I noticed a great discussion on StackOverflow discussing methods of solving this problem, and another blog post describing ten ways to do this in C. I've translated a few implementations into Factor to contrast the various approaches. The signature of the words we will create looks like this:

: power-of-2? ( n -- ? )

And some basic test cases used to verify that it works:

[ t ] [ {  1 2 4 1024 } [ power-of-2? ] all? ] unit-test
[ f ] [ { -1 0 3 1025 } [ power-of-2? ] any? ] unit-test

Implementations

We can shift the number to the right, checking to see that the first odd value observed is 1:

: shift-right/power-of-2? ( n -- ? )
    dup 0 <= [ drop f ] [ [ dup even? ] [ 2/ ] while 1 = ] if ;

Or, we can use a virtual sequence of bits and count the number of "on" bits (should be only 1):

: bits/power-of-2? ( n -- ? )
    dup 0 <= [ drop f ] [ make-bits [ t? ] count 1 = ] if ;

Or, we can compute the integer log2 raised to the second power, and compare:

: log2/power-of-2? ( n -- ? )
    dup 0 <= [ drop f ] [ dup log2 2^ = ] if ;

Or, we can calculate the next-power-of-2, and compare:

: next-power/power-of-2? ( n -- ? )
    dup 1 = [ drop t ] [ dup next-power-of-2 = ] if ;

Or, we can compare the number with its two's complement:

: complement/power-of-2? ( n -- ? )
    dup 0 <= [ drop f ] [ dup dup neg bitand = ] if ;

Or, we can decrement the number and compare it with the original:

: decrement/power-of-2? ( n -- ? )
    dup 0 <= [ drop f ] [ dup 1 - bitand zero? ] if ;

Or, we can define a lookup table (using the literals vocabulary to define the table at compile time) holding all possible 64-bit powers of 2 (restricting the range of valid inputs to 64-bits):

CONSTANT: POWERS-OF-2 $[ 64 iota [ 2^ ] map ]

Using this, we can check a given number against all the values in the lookup table:

: check-all/power-of-2? ( n -- ? )
    POWERS-OF-2 member? ;

Or, we can do a linear search, stopping when we see numbers too large:

: linear-search/power-of-2? ( n -- ? )
    POWERS-OF-2 over [ >= ] curry find nip = ;

Or, knowing that the lookup table is sorted, we can do a binary search:

: binary-search/power-of-2? ( n -- ? )
    POWERS-OF-2 sorted-member? ;

Or, we can compute a hash-set (at compile time), and check for membership:

: hash-search/power-of-2? ( n -- ? )
    $[ POWERS-OF-2 fast-set ] in? ;

Or, we can use the integer log2 as an index into the lookup table.

: log-search/power-of-2? ( n -- ? )
    dup 0 <= [ drop f ] [ dup log2 POWERS-OF-2 nth = ] if ;

Testing

We can make a list of all our implementations:

CONSTANT: IMPLEMENTATIONS {
    shift-right/power-of-2?
    bits/power-of-2?
    log2/power-of-2?
    next-power/power-of-2?
    complement/power-of-2?
    decrement/power-of-2?
    check-all/power-of-2?
    linear-search/power-of-2?
    binary-search/power-of-2?
    hash-search/power-of-2?
    log-search/power-of-2?
}

And then test their functionality:

: test-power-of-2 ( -- ? )
    IMPLEMENTATIONS [
        1quotation [ call( n -- ? ) ] curry
        [ {  1 2 4 1024 } swap all? ]
        [ { -1 0 3 1025 } swap any? not ] bi and
    ] all? ;

Sure enough, they seem to work:

( scratchpad ) test-power-of-2 .
t

Performance

We can benchmark the performance of the various implementations operating on 1,000,000 random 32-bit numbers:

: bench-power-of-2 ( -- assoc )
    IMPLEMENTATIONS randomize 20 2^ [ random-32 ] replicate '[
        [ name>> "/" split1 drop ] [
            1quotation [ drop ] compose
            [ each ] curry [ _ ] prepose
            nano-count [ call( -- ) nano-count ] dip -
        ] bi
    ] { } map>assoc ;

Running the benchmark, we see that log2/power-of-2? is the (slightly) fastest version:


The raw numbers from one of my benchmark runs:
( scratchpad ) bench-power-of-2 sort-values .
{
    { "log2" 118107290 }
    { "complement" 119691428 }
    { "decrement" 121455742 }
    { "log-search" 122799186 }
    { "next-power" 127366447 }
    { "shift-right" 137695485 }
    { "binary-search" 204224141 }
    { "check-all" 267042396 }
    { "hash-search" 269629705 }
    { "linear-search" 280441186 }
    { "bits" 1112186059 }
}

Improvement

But, can we do better? We have already created a faster implementation than the math vocabulary, which defines power-of-2? using "decrement". Focusing on that implementation, perhaps we can still introduce some improvements.

We can do less work, by exiting early using a short-circuit combinator if the first test fails:

: decrement+short/power-of-2? ( n -- ? )
    { [ dup 1 - bitand zero? ] [ 0 > ] } 1&& ;

Or, we can add type information, assuming only fixnum values (restricting our possible input values to a 60-bit number between -576,460,752,303,423,488 and 576,460,752,303,423,487):

TYPED: decrement+typed/power-of-2? ( n: fixnum -- ? )
    dup 0 <= [ drop f ] [ dup 1 - bitand zero? ] if ;

Or, if we are okay with restricting the input values, we can try writing it in C:

  1. Build a simple C function in power-of-2.c:
#include <stdint.h>

int64_t isPowerOfTwo (int64_t x)
{
    return ((x > 0) && ((x & (x - 1)) == 0));
}
  1. Build a C library we can use :

$ cc -fno-common -c power-of-2.c
$ cc -dynamiclib -install_name power-of-2.dylib \
     -o power-of-2.dylib power-of-2.o
$ sudo mv power-of-2.dylib /usr/local/lib
  1. Wrap the C library from Factor (using the alien vocabulary):

USING: alien alien.c-types alien.syntax alien.libraries ;

"libpowerof2" "power-of-2.dylib" cdecl add-library

LIBRARY: libpowerof2

FUNCTION: int isPowerOfTwo ( int x ) ;
  1. And, finally, build a Factor word that uses it:
: decrement+alien/power-of-2? ( n -- ? )
    isPowerOfTwo 1 = ;

Running the benchmarks shows the typed version only slightly beating the short-circuit version, with a roughly 10% improvement:

{
    { "decrement+typed" 111711456 }
    { "decrement+short" 112070520 }
    { "decrement+alien" 113014058 }
    { "decrement" 123256748 }
}

Given that we want some ability to generalize our function to all integer inputs, I'd be happy declaring decrement+short/power-of-2? the "winner". Can you do better?

The code for this is on my Github.