Monday, November 29, 2010

Estimating CPU Speed

Factor contains a nice DSL for writing assembly code. I thought it would be fun to investigate how it works by accessing the CPU's Time Stamp Counter to estimate CPU speed.

The X86 instruction for accessing the timestamp value (incremented every CPU tick) is called RDTSC (a 2-byte instruction 0x0f 0x31). Some C code for calling the 32-bit or 64-bit versions of this looks like:

#if defined(__i386__)

static __inline__ unsigned long long rdtsc(void)
    unsigned long long int x;
    __asm__ __volatile__ (".byte 0x0f, 0x31" : "=A" (x));
    return x;

#elif defined(__x86_64__)

static __inline__ unsigned long long rdtsc(void)
    unsigned long long hi, lo;
    __asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
    return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 );


Factor provides utilities for calling arbitrary assembly code in the alien vocabulary. Using this, we can create corresponding Factor code (supporting both 32 and 64 bits):

USING: alien alien.c-types cpu.x86.assembler
cpu.x86.assembler.operands system ;

HOOK: rdtsc cpu ( -- n )

M: x86.32 rdtsc
    longlong { } cdecl [
    ] alien-assembly ;

M: x86.64 rdtsc
    longlong { } cdecl [
        RAX 0 MOV
        RDX 32 SHL
        RAX RDX OR
    ] alien-assembly ;

You can see in the implementation above how Factor uses the type of CPU (contained in the cpu variable) to dispatch on the correct version of the rdtsc word.

To estimate CPU speed, we will need to define two "benchmarking" words:

  1. Calculate the number of CPU ticks it takes to execute some Factor code.
  2. Calculate the time it takes to execute some Factor code.
USING: kernel math system ;

: #ticks ( quot -- n )
    rdtsc [ call rdtsc ] dip - ; inline

: #nanos ( quot -- n )
    nano-count [ call nano-count ] dip - ; inline

We can then create a "busy loop" that runs for some time, then estimates CPU speed as ticks-per-second:

: busy-loop ( -- )
    100000000 [ 1 - dup 0 > ] loop drop ;

: cpu-speed ( -- n )
    [ [ busy-loop ] #nanos ] #ticks swap / 1000000000.0 * ;

Running this on my MacBook Pro (with a 2.66 GHz processor) produces this estimate:

( scratchpad ) cpu-speed .

The rdtsc and #ticks words are distributed with Factor as instruction-count and count-instructions and are available in the cpu.x86.features vocabulary. The #nanos word is called benchmark and is available in the tools.time vocabulary.

1 comment:

Sam said...

Does it mean that all registers are considered as dirty after you use alien-assembly, or are %rax and %rdx caller-save registers?