Factor has a bit? generic that is used to test an integer to see if a particular bit is set. When operating on fixnum integers, this is implemented by the fixnum-bit?
word:
: fixnum-bit? ( x n -- ? ) { fixnum fixnum } declare dup 0 >= [ neg shift even? not ] [ 2drop f ] if
Many CPU architectures have instructions for performing a Bit Test. On x86, the BT instruction is available. Below, we are going to implement a compiler intrinsic in the hopes of speeding up this operation in Factor.
Intrinsic
In cpu.architecture, add a generic %bit-test
based on our CPU:
HOOK: %bit-test cpu ( dst src1 src2 temp -- )
In cpu.x86, implement %bit-test
on x86 (returning a boolean using a temporary register and the CMOVB instruction to check the carry flag which holds the result of the bit test):
M:: x86 %bit-test ( dst src1 src2 temp -- ) src1 src2 BT dst temp \ CMOVB (%boolean) ;
In compiler.cfg.instructions, add a ##bit-test
instruction:
FOLDABLE-INSN: ##bit-test
def: dst/tagged-rep
use: src1/int-rep src2/int-rep
temp: temp/int-rep ;
In compiler.codegen, we link the ##bit-test
instruction with the %bit-test
word.
CODEGEN: ##bit-test %bit-test
In compiler.cfg.intrinsics, enable replacing fixnum-bit?
with the %bit-test
intrinsic:
: enable-bit-test ( -- ) { { fixnum-bit? [ drop [ ^^bit-test ] binary-op ] } } enable-intrinsics ;
Disassemble
The old assembly using fixnum-bit?
looked like this:
000000010ea00250: mov [rip-0x1ff256], eax 000000010ea00256: sub rsp, 0x8 000000010ea0025a: call 0x10eedf3b0 (integer>fixnum-strict) 000000010ea0025f: mov rax, [r14] 000000010ea00262: cmp rax, 0x0 000000010ea00266: jl 0x10ea002a7 (M\ fixnum bit? + 0x57) 000000010ea0026c: neg rax 000000010ea0026f: mov [r14], rax 000000010ea00272: call 0x10ebec060 (fixnum-shift) 000000010ea00277: mov rax, [r14] 000000010ea0027a: test rax, 0x10 000000010ea00281: mov rax, 0x1 000000010ea0028b: mov rbx, 0x1010eff4c 000000010ea00295: cmovnz rax, rbx 000000010ea00299: mov [r14], rax 000000010ea0029c: mov [rip-0x1ff2a2], eax 000000010ea002a2: add rsp, 0x8 000000010ea002a6: ret 000000010ea002a7: sub r14, 0x8 000000010ea002ab: mov qword [r14], 0x1 000000010ea002b2: mov [rip-0x1ff2b8], eax 000000010ea002b8: add rsp, 0x8 000000010ea002bc: ret
The new assembly looks like this with BT:
000000010e656ec0: mov [rip-0x293ec6], eax 000000010e656ec6: sub rsp, 0x8 000000010e656eca: call 0x10e467ea0 (integer>fixnum-strict) 000000010e656ecf: mov rax, [r14] 000000010e656ed2: mov rbx, [r14-0x8] 000000010e656ed6: sar rbx, 0x4 000000010e656eda: sar rax, 0x4 000000010e656ede: bt rbx, rax 000000010e656ee2: mov rbx, 0x1 000000010e656eec: mov rcx, 0x1018f169c 000000010e656ef6: cmovb rbx, rcx 000000010e656efa: sub r14, 0x8 000000010e656efe: mov [r14], rbx 000000010e656f01: mov [rip-0x293f07], eax 000000010e656f07: add rsp, 0x8 000000010e656f0b: ret
Performance
Turns out that this speeds up fixnum-bit?
by a lot. Using a simple test case:
: bench-fixnum-bit ( x n -- ? ) { fixnum fixnum } declare [ 0 100,000,000 ] 2dip '[ _ _ bit? [ 1 + ] when ] times ;
Our old version was decently fast:
IN: scratchpad gc [ 0b101010101 0 bench-fixnum-bit ] time Running time: 0.821433838 seconds
But our new version is much faster!
IN: scratchpad gc [ 0b101010101 0 bench-fixnum-bit ] time Running time: 0.239439108 seconds
This has been committed to the development version and is available now.
1 comment:
Hey John, I seem to have misplaced your email. Can you ping me at pwang [at] continuum.io? I'm actually in your neck of the woods for a few days, let me know if you want to grab a beer!
Post a Comment