Re: Factor: May 2012

Tuesday, May 29, 2012

Wake-on-LAN

Wake-on-LAN is an ethernet standard that allows a computer to be turned on or woken up by a network message. This is a kinda-fun-sometimes-useful feature that I thought we should have in Factor.

The way the protocol works is to broadcast a "magic packet" that contains information describing the computer you want to wakeup. From the Wikipedia article:

The magic packet is a broadcast frame containing anywhere within its payload 6 bytes of all 255 (FF FF FF FF FF FF in hexadecimal), followed by sixteen repetitions of the target computer's 48-bit MAC address, for a total of 102 bytes. Since the magic packet is only scanned for the string above, and not actually parsed by a full protocol stack, it may be sent as any network- and transport-layer protocol, although it is typically sent as a UDP datagram to port 7 or 9, or directly over Ethernet as EtherType 0x0842.

First, we need a way to turn a MAC address into an array of bytes:

: mac-address-bytes ( mac-address -- byte-array )
    ":-" split [ hex> ] B{ } map-as ;

Next, we need a way to construct the "magic packet" according to the specification:

: wake-on-lan-packet ( mac-address -- bytearray )
    [ 16 ] [ mac-address-bytes ] bi* <array> concat
    B{ 0xff 0xff 0xff 0xff 0xff 0xff } prepend ;

Now that we have the "magic packet", we can create a datagram port configured to send broadcast packets (using a recent commit that made this easy and cross-platform):

: wake-on-lan ( mac-address broadcast-ip -- )
    [ wake-on-lan-packet ] [ 9 <inet4> ] bi*
    f 0 <inet4> <broadcast> [ send ] with-disposal ;

It's easy enough to test this using Mac OS X since most modern Mac hardware supports Wake-on-LAN. You can configure it via the System Preferences Energy Saver panel, in the Options tab. Marking the "Wake for network access" checkbox enables Wake-on-LAN support. For other platforms, read more about receiving the magic packet.

The code for this is on my Github.

Monday, May 14, 2012

Finding Duplicate Numbers

One of my favorite interview questions goes something like this:

Given an array of 1001 elements which contains integers from 1 to 1000 inclusive. The numbers are randomly stored in the array. Only one number repeats itself. The candidate has to come up with an efficient solution for finding that duplicate given that you can access the elements of an array only once i.e., you can read the elements of the array only once.

I thought I would show some approaches to solving this, using both Python and Factor.

Numbers

First, we need to make a randomized array of our numbers (with one duplicate):

Python:

def make_numbers(n):
    seq = [n] + range(1, 1001)
    random.shuffle(seq)
    return seq

Factor:

: make-numbers ( n -- seq )
    [ 1000 [1,b] ] dip suffix randomize ;

It's worth pointing out that Factor has many "builtin" words that can be helpful for solving this, making the shortest solution:

IN: scratchpad 12 make-numbers duplicates first .
12

Enumeration

While this solution is not linear, it is often the "obvious" first way to solve this problem. It goes something like this: for each element in the list, check if duplicated by any other element:

Python:

def find_duplicate(seq):
    for i, m in enumerate(seq):
        for n in seq[i+1:]:
            if m == n:
                return m
    return False

Factor:

: find-duplicate ( seq -- n )
    dup '[ 1 + _ index-from ] find-index nip ;

Sets

Our first "linear" solution uses sets to track which elements we've seen, stopping when we encounter our first duplicate. We will gloss over the extra memory and time required to maintain the set.

Python:

def find_duplicate(seq):
    d = set()
    for m in seq:
        if m in d:
            return m
        d.add(m)
    return False

Factor:

: find-duplicate ( seq -- n ) 
    f fast-set '[ _ ?adjoin not ] find nip ;

Both versions use a generic hash-set. Knowing that our data is numeric values, we can use a bit-set and gain some performance:

: find-duplicate ( seq -- n )
    dup length <bit-set> '[ _ ?adjoin not ] find nip ;

Number Theory

Taking advantage of the fact that our numbers are from 1 to 1000, we can compute the sum of the numbers, and subtract the expected total (using the formula (n * n+1) / 2) to find the duplicate:

Python:

def find_duplicate(seq):
    n = len(seq) - 1
    return sum(seq) - (n*(n+1))/2

Factor:

: find-duplicate ( seq -- n )
    [ sum ] [ length dup 1 - * 2 / ] bi - ;

What do you think? Any other good, fun, or unusual ways to solve this problem?

Update: Zeev pointed out in the comments that another way would be to XOR all the values in our sequence and the numbers 1 to 1000:

Python:

def find_duplicate(seq):
    n1 = 0
    for x in seq:
        n1 ^= x
    n2 = 0
    for x in range(1, 1001):
        n2 ^= x
    return n1 ^ n2

Factor:

: find-duplicate ( seq -- n )
    1000 [1,b] [ 0 [ bitxor ] reduce ] bi@ bitxor ;

Wednesday, May 9, 2012

Random Name Generator

In the spirit of the (almost) pure random demon name generator, I wanted to show a random name generator in Factor.

Some time ago, I implemented a vocabulary for creating fake data which could generate "fake names". The way it worked was to make a name by picking randomly from a list of valid first and last names. The drawback with that approach is that you can only create names that already exist in your list. It would be more interesting if you can use a list of valid names to "seed" a random name generator which uses that to create names that are similar to your list but do not appear in it.

Transition Tables

To do this, we will create a table of "transitions". A transition is a pair of characters that appear next to each other in a name. In pseudo-code, it will be something like this:

For each name from a list of valid names,
For every character in the name,
Update the table by adding the "transition" at the characters index.

We can create a sequence of transitions for a given string by clumping each pair of characters together (as well as showing the last character transitions to f, meaning the end of the word):

: transitions ( string -- enum )
    { f } { } append-as 2 clump <enum> ;

Given a string and a transition table, we can update it with those transitions:

: (transition-table) ( string assoc -- )
    [ transitions ] dip '[ swap _ push-at ] assoc-each ;

So, given a sequence of strings, we can create the transition table easily:

: transition-table ( seq -- assoc )
    H{ } clone [ [ (transition-table) ] curry each ] keep ;

You can try it out and it will look something like this:

IN: scratchpad { "hello" } transition-table .
H{
    { 0 V{ { CHAR: h CHAR: e } } }
    { 1 V{ { CHAR: e CHAR: l } } }
    { 2 V{ { CHAR: l CHAR: l } } }
    { 3 V{ { CHAR: l CHAR: o } } }
    { 4 V{ { CHAR: o f } } }
}

Generating Names

We can use the make vocabulary to randomly choose the next transition from our table given a previous character, an index, and a transition table:

: next-char, ( prev index assoc -- next )
    at swap [ [ nip = ] curry assoc-filter ] when*
    random [ first , ] [ second ] bi ;

Generating the names is as easy as starting from zero and adding each character until we hit an f indicating the end of a word.

: generate-name ( seq -- name )
    transition-table [
        f 0 [
            [ pick next-char, ] [ 1 + ] bi over
        ] loop 3drop
    ] "" make ;

Generating a number of names is just:

: generate-names ( n seq -- names )
    [ generate-name ] curry replicate ;

Try It

So, does it work? Well, if we load a list of Star Trek races, we can generate some new names that sound pretty good!

IN: scratchpad 10 star-trek-races generate-names .
{
    "Sooyllan"
    "Brakalan"
    "Napl"
    "Drizi"
    "Tevorri"
    "Flladian"
    "Skadi"
    "Cynocak"
    "Pamu"
    "Patha"
}

The code for this is on my Github.