Factor contains a nice DSL for writing assembly code. I thought it would be fun to investigate how it works by accessing the CPU's Time Stamp Counter to estimate CPU speed.
The X86 instruction for accessing the timestamp value (incremented every CPU tick) is called RDTSC (a 2-byte instruction 0x0f 0x31
). Some C code for calling the 32-bit or 64-bit versions of this looks like:
#if defined(__i386__) static __inline__ unsigned long long rdtsc(void) { unsigned long long int x; __asm__ __volatile__ (".byte 0x0f, 0x31" : "=A" (x)); return x; } #elif defined(__x86_64__) static __inline__ unsigned long long rdtsc(void) { unsigned long long hi, lo; __asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi)); return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 ); } #endif
Factor provides utilities for calling arbitrary assembly code in the alien vocabulary. Using this, we can create corresponding Factor code (supporting both 32 and 64 bits):
USING: alien alien.c-types cpu.x86.assembler cpu.x86.assembler.operands system ; HOOK: rdtsc cpu ( -- n ) M: x86.32 rdtsc longlong { } cdecl [ RDTSC ] alien-assembly ; M: x86.64 rdtsc longlong { } cdecl [ RAX 0 MOV RDTSC RDX 32 SHL RAX RDX OR ] alien-assembly ;
You can see in the implementation above how Factor uses the type of CPU (contained in the cpu
variable) to dispatch on the correct version of the rdtsc
word.
To estimate CPU speed, we will need to define two "benchmarking" words:
- Calculate the number of CPU ticks it takes to execute some Factor code.
- Calculate the time it takes to execute some Factor code.
USING: kernel math system ; : #ticks ( quot -- n ) rdtsc [ call rdtsc ] dip - ; inline : #nanos ( quot -- n ) nano-count [ call nano-count ] dip - ; inline
We can then create a "busy loop" that runs for some time, then estimates CPU speed as ticks-per-second:
: busy-loop ( -- ) 100000000 [ 1 - dup 0 > ] loop drop ; : cpu-speed ( -- n ) [ [ busy-loop ] #nanos ] #ticks swap / 1000000000.0 * ;
Running this on my MacBook Pro (with a 2.66 GHz processor) produces this estimate:
( scratchpad ) cpu-speed . 2660324566.190773
The rdtsc
and #ticks
words are distributed with Factor as instruction-count
and count-instructions
and are available in the cpu.x86.features
vocabulary. The #nanos
word is called benchmark
and is available in the tools.time
vocabulary.
1 comment:
Does it mean that all registers are considered as dirty after you use alien-assembly, or are %rax and %rdx caller-save registers?
Post a Comment