Friday, April 26, 2013

Terminfo

While investigating how to determine if a terminal is color capable, I re-discovered terminfo databases. These database files store the capabilities of terminals in a device-independent manner.

tput

The simple answer is to use the tput program to lookup the terminal functionality (using the TERM environment variable):

$ TERM=xterm-256color tput colors
256

$ TERM=xterm tput colors
8

$ TERM=vt100 tput colors
-1

If you trace the system calls that tput makes, you will see that it is loading a terminfo file to provide the answer:

...
stat64("/usr/share/terminfo\0", 0x7FFF5F21A120, 0x7FB279403AD0)
access("/usr/share/terminfo/78/xterm-256color\0", 0x4, 0xE)
open("/usr/share/terminfo/78/xterm-256color\0", 0x0, 0x0)
read(0x3, "\032\001%\0", 0x1001)
...

I wanted to have access to these capabilities from Factor, without running tput, and chose instead to directly parse the terminfo files.

terminfo

The compiled terminfo file is created by the tic program, and begins with a header containing six two-byte short integers:

  1. the magic number (octal 0432)
  2. the size, in bytes, of the names section
  3. the number of bytes in the boolean section
  4. the number of short integers in the numbers section
  5. the number of offsets (short integers) in the strings section
  6. the size, in bytes, of the string table

We can parse this header pretty easily using the pack vocabulary:

TUPLE: terminfo-header names-bytes boolean-bytes #numbers
#strings string-bytes ;

C: <terminfo-header> terminfo-header

: read-header ( -- header )
    12 read "ssssss" unpack-le unclip
    0o432 = [ "bad magic" throw ] unless
    5 firstn <terminfo-header> ;

The names section comes next, containing the various names for the terminal separated by a "|" character and terminated by a NUL byte ("0"):

: read-names ( header -- names )
    names-bytes>> read but-last "|" split [ >string ] map ;

The boolean section is stored as one byte per boolean flag, either a 0 or 1:

: read-booleans ( header -- booleans )
    boolean-bytes>> read [ 1 = ] { } map-as ;

The number section is stored as a sequence of two-byte short integers, aligned to an even byte (meaning if the name and boolean sections consume an "odd" number of bytes, an extra byte is inserted that should be skipped over to ensure the numbers start on an even byte):

: read-shorts ( n -- seq' )
    2 * read 2 <groups> [ signed-le> dup 0 < [ drop f ] when ] map ;

: align-even-bytes ( header -- )
    [ names-bytes>> ] [ boolean-bytes>> ] bi + odd?
    [ read1 drop ] when ;

: read-numbers ( header -- numbers )
    [ align-even-bytes ] [ #numbers>> read-shorts ] bi ;

The strings are more complex, stored in two sections. The first section is a sequence of two-byte short integers and the second section is a sequence of bytes. To rebuild the string capabilities, interpret the integers as an offset into the string table:

: read-strings ( header -- strings )
    [ #strings>> read-shorts ] [ string-bytes>> read ] bi '[
        [ _ 0 2over index-from swap subseq >string ] [ f ] if*
    ] map ;

Putting this all together, we can "parse" our terminfo file into an object:

TUPLE: terminfo names booleans numbers strings ;

C: <terminfo> terminfo

: read-terminfo ( -- terminfo )
    read-header {
        [ read-names ]
        [ read-booleans ]
        [ read-numbers ]
        [ read-strings ]
    } cleave <terminfo> ;

Finally, we can write a parsing word to convert a terminfo file into a terminfo object:

: file>terminfo ( path -- terminfo )
    binary [ read-terminfo ] with-file-reader ;

/usr/share/terminfo

The terminfo files are stored in /usr/share/terminfo. If we wanted to get a list of all available terminfo files, we can just list this directory:

MEMO: terminfo-names ( -- names )
    "/usr/share/terminfo" [
        [ directory-files ] map concat
    ] with-directory-files ;

If instead, we wanted to lookup a specific terminal, we can map the name of the terminal to a directory. On Mac OS X, these are stored in a sub-directory with the hexadecimal representation of the first byte in the string. On Linux, the first character is the name of the sub-directory:

HOOK: terminfo-path os ( name -- path )

M: macosx terminfo-path ( name -- path )
    [ first >hex ] keep "/usr/share/terminfo/%s/%s" sprintf ;

M: linux terminfo-path ( name -- path )
    [ first ] keep "/usr/share/terminfo/%c/%s" sprintf ;

Success!

With just this much implemented, we can lookup our "max_colors" attribute, knowing it is the 14th number in the numbers table:

: max-colors ( name -- n/f )
    terminfo-path file>terminfo numbers>> 13 swap ?nth ;

IN: scratchpad "xterm-256color" max-colors .
256

IN: scratchpad "xterm" max-colors .
8

IN: scratchpad "vt100" max-colors .
f

I added support for parsing all the capabilities into a hashtable, and allowing named lookup (rather than needing to know the offset like we used above).

This is available now in the terminfo vocabulary.

Saturday, April 20, 2013

Factor 0.96 now available

"You're smart too late and old too soon." - Mike Tyson

I'm very pleased to announce the release of Factor 0.96!

OS/CPUWindowsMac OS XLinux
x86
x86-64

Source code: 0.96

This release is brought to you with over 1100 commits by the following individuals:

Alex Vondrak, Benjamin Pollack, Daniel Nagel, Doug Coleman, John Benediktsson, Jon Harper, Michael T. Richter, and @PGGB.

Aside from bug fixes and various library improvements, I want to highlight the following changes:

  • Major compiler improvements (thanks Alex Vondrak!):
    • Global Value Numbering (disabled currently by default)
    • Parallel-Copy Semantics
  • Performance improvements to hashtables, hash-sets, heaps, and bloom-filters.
  • Support for Retina Displays on OS X
  • Greatly improved table gadget performance
  • PDF streams (and related PDF documentation)!
  • Speed up math comparisons
  • Support resize-window (thanks Jon Harper!)
  • New logo and icons for Factor
  • Added editor support for Textadept, Sublime Text, Geany, BBEdit, and XCode.

Some possible backwards compatibility issues:

  • Changed <groups>, <clumps>, and <circular-clumps> to use slices.
  • Removed <slicing-groups>, <slicing-clumps>, <slicing-circular-clumps>.
  • Renamed editors.notepadpp to editors.notepad++.

Since quite a few performance improvements have been made, I thought I would highlight some improvements using our benchmarks against version 0.95. The chart below shows benchmark time in 0.96 as a percent of the time it took in 0.95, lower is better:


What is Factor

Factor is a concatenative, stack-based programming language with high-level features including dynamic types, extensible syntax, macros, and garbage collection. On a practical side, Factor has a full-featured library, supports many different platforms, and has been extensively documented.

The implementation is fully compiled for performance, while still supporting interactive development. Factor applications are portable between all common platforms. Factor can deploy stand-alone applications on all platforms. Full source code for the Factor project is available under a BSD license.


New Libraries

Monday, April 15, 2013

Factorial

Calculating factorial numbers is frequently used to compare programming languages. Usually the implementation is simple, as is the one in Factor:

MEMO: factorial ( n -- n! )
    dup 1 > [ [1,b] product ] [ drop 1 ] if ;

I hadn't realized until I skimmed the Wikipedia article on Factorials that there are actually many more kinds of factorials and what Factor really needed was implementations of all of them!

Lots of Factorials

primorial

Similar to the factorial, the primorial is the product of the first n prime numbers.

double-factorial

The product of all the odd integers up to some odd positive integer n is called the double factorial of n, and denoted by n!!.

multifactorial

The multifactorial is a product of integers in steps of two (n!!, the "double factorial"), three (n!!!), or more (in general for a given k step: n!(k)).

quadruple-factorial

The so-called quadruple factorial, however, is not the multifactorial n!(4); it is a much larger number given by (2n)!/n!.

super-factorial

The super factorial is the product of the first n factorials.

hyper-factorial

The hyper factorial is defined as:

alternating-factorial

The alternating factorial is the absolute value of the alternating sum of the first n factorials.

exponential-factorial

The exponential factorial is a positive integer n raised to the power of n−1, which in turn is raised to the power of n − 2, and so on and so forth.

Be careful with n > 4: the exponential factorial of 5 is 5262144 which is approximately 6.206069878660874 × 10183230.

factorial-prime?

A factorial prime is a prime number that is one less or one more than a factorial.

primorial-prime?

A primorial prime is a prime number that is one less or one more than a primorial.

falling-factorial

The "descending factorial", "falling sequential product", or "lower factorial".

factorial-power

A generalized version of falling factorial.

rising-factorial

The "ascending factorial", "rising sequential product", or "upper factorial".

Now available!

That's probably more factorials than anyone really cares to know about, but now Factor has more than ten times as many factorials as before! The code could probably use some cleanup, but it is available now in the math.factorials vocabulary.

Thursday, April 11, 2013

Bitwise permutations

Using bit twiddling hacks can be fun. When I noticed compute the lexicographically next bit permutation, I knew it could be a neat feature for Factor.

The idea is that given a number like 7 (represented as 00111 in bits), the next few lexicographic permutations of numbers with 3 bits set would be 01011, 01101, 01110, 10011, 10101, 10110.

Next Permutation

The "bit hacks" website uses this C code to calculate the next permutation of bits:

unsigned int v; // current permutation of bits
unsigned int w; // next permutation of bits

unsigned int t = (v | (v - 1)) + 1;
w = t | ((((t & -t) / (v & -v)) >> 1) - 1);

A direct translation into Factor looks like this:

: next-permutation-bits ( v -- w )
    [ dup 1 - bitor 1 + dup ] keep
    [ dup neg bitand ] bi@ /i 2/ 1 - bitor ;

Permutation Words

Factor has a math.combinatorics vocabulary containing useful operations on permutations and combinations for sequences. Now that we have a way of obtaining the next bitwise permutation of a number, I thought we should have similar words for operating on permutations of bits.

A helper word takes bit-count (the number of set bits) and bits (the overall number of bits), and quot (a quotation to apply to each permutation), returning an initial starting permutation, a predicate to continue as long as we are generating valid numbers, and a body that applies the quotation then calculates the next permutation in the sequence:

: permutation-bits-quot ( bit-count bits quot -- n pred body )
    [ [ on-bits dup '[ dup _ >= ] ] [ on-bits ] bi* ] dip swap
    '[ _ [ next-permutation-bits _ bitand ] bi ] ; inline

That might look a little complicated, but it makes these operations fairly simple:

: each-permutation-bits ( bit-count bits quot: ( n -- ) -- )
    permutation-bits-quot while drop ; inline

: map-permutation-bits ( bit-count bits quot: ( n -- m ) -- seq )
    permutation-bits-quot [ swap ] compose produce nip ; inline

: filter-permutation-bits ( bit-count bits quot: ( n -- ? ) -- seq )
    selector [ each-permutation-bits ] dip ; inline

: all-permutation-bits ( bit-count bits -- seq )
    [ ] map-permutation-bits ;

: find-permutation-bits ( bit-count bits quot: ( n -- ? ) -- elt/f )
    [ f f ] 3dip [ 2nip ] prepose [ keep swap ] curry
    permutation-bits-quot [ [ pick not and ] compose ] dip
    while drop swap and ; inline

Try It

You can then do things like this:

! find all 5-bit numbers with 3 bits set
IN: scratchpad 3 5 all-permutation-bits .
{ 7 11 13 14 19 21 22 25 26 28 }

! find the first 5-bit number with 3 bits set, multiples of 5
IN: scratchpad 3 5 [ 5 divisor? ] find-permutation-bits .
25

! print all the even 5-bit numbers with 3 bits set
IN: scratchpad 3 5 [ even? ] filter-permutation-bits [ .b ] each
01110
10110
11010
11100

This is available in the main repository in the math.combinatorics.bits vocabularly.

Tuesday, April 2, 2013

move

I've used Factor to build several common unix programs including copy, cat, fortune, wc, and others.

Today, I wanted to show how to build the mv ("move") program using the simple file manipulation words available in Factor. If we look at the man page, we can see that its usage is two-fold:

  1. Rename a source file to a destination file
  2. Move source file(s) to a destination directory

We can make a nice usage string to display if the arguments are not correct:

: usage ( -- )
    "Usage: move source ... target" print ;

Moving files into a directory (and displaying the usage if the destination is not a directory):

: move-to-dir ( args -- )
    dup last file-info directory?
    [ unclip-last move-files-into ] [ drop usage ] if ;

If we specify two arguments, we are either moving a single file into a directory, or renaming a file using the move-file word:

: move-to-file ( args -- )
    dup last file-info directory?
    [ move-to-dir ] [ first2 move-file ] if ;

A "main" word checks the number of arguments and performs the appropriate action:

: run-move ( -- )
    command-line get dup length {
        { [ dup 2 > ] [ drop move-to-dir  ] }
        { [ dup 2 = ] [ drop move-to-file ] }
        [ 2drop usage ]
    } cond ;

MAIN: run-move

An improvement that could be made would be producing better error messages when files don't exist. Something like the errors mv produces:

$ mv src dst
mv: rename src to dst: No such file or directory

The code for this is available on my Github.