Friday, March 25, 2011

Sum

Today's programming challenge is to implement the "old Unix Sys V R4" sum command:

"The original sum calculated a checksum as the sum of the bytes in the file, modulo 216−1, as well as the number of 512-byte blocks the file occupied on disk. Called with no arguments, sum read standard input and wrote the checksum and file blocks to standard output; called with one or more filename arguments, sum read each file and wrote for each a line containing the checksum, file blocks, and filename."

First, some imports:

USING: command-line formatting io io.encodings.binary io.files
kernel math math.functions namespaces sequences ;

Short Version

A quick file-based version might look like this:

: sum-file. ( path -- )
    [
        binary file-contents
        [ sum 65535 mod ] [ length 512 / ceiling ] bi
    ] [ "%d %d %s\n" printf ] bi ;

You can try it out:

( scratchpad ) "/usr/share/dict/words" sum-file.
19278 4858 /usr/share/dict/words

The main drawbacks to this version are: loading the entire file into memory (which might be a problem for big files), not printing an error if the file is not found, and not supporting standard input.

Full Version

A more complete version might begin by implementing a function that reads from a stream, computing the checksum and the number of 512-byte blocks:

: sum-stream ( -- checksum blocks )
    0 0 [ 65536 read-partial dup ] [
        [ sum nip + ] [ length + nip ] 3bi
    ] while drop [ 65535 mod ] [ 512 / ceiling ] bi* ;

The output should look like CHECKSUM BLOCKS FILENAME:

: sum-stream. ( path -- )
    [ sum-stream ] dip "%d %d %s\n" printf ;

We can generate output for a particular file (printing FILENAME: not found if the file does not exist):

: sum-file. ( path -- )
    dup exists? [
        dup binary [ sum-stream. ] with-file-reader
    ] [ "%s: not found\n" printf ] if ;

And, to prepare a version of sum that we can deploy as a binary and run from the command line, we build a simple MAIN: word:

: run-sum ( -- )
    command-line get [ "" sum-stream. ] [
        [ sum-file. ] each
    ] if-empty ;

MAIN: run-sum

The code for this is on my Github.

No comments: