Friday, June 19, 2015

Bit Test

Factor has a bit? generic that is used to test an integer to see if a particular bit is set. When operating on fixnum integers, this is implemented by the fixnum-bit? word:

: fixnum-bit? ( x n -- ? )
    { fixnum fixnum } declare
    dup 0 >= [ neg shift even? not ] [ 2drop f ] if 

Many CPU architectures have instructions for performing a Bit Test. On x86, the BT instruction is available. Below, we are going to implement a compiler intrinsic in the hopes of speeding up this operation in Factor.

Intrinsic

In cpu.architecture, add a generic %bit-test based on our CPU:

HOOK: %bit-test cpu ( dst src1 src2 temp -- )

In cpu.x86, implement %bit-test on x86 (returning a boolean using a temporary register and the CMOVB instruction to check the carry flag which holds the result of the bit test):

M:: x86 %bit-test ( dst src1 src2 temp -- )
    src1 src2 BT
    dst temp \ CMOVB (%boolean) ;

In compiler.cfg.instructions, add a ##bit-test instruction:

FOLDABLE-INSN: ##bit-test
def: dst/tagged-rep
use: src1/int-rep src2/int-rep
temp: temp/int-rep ;

In compiler.codegen, we link the ##bit-test instruction with the %bit-test word.

CODEGEN: ##bit-test %bit-test

In compiler.cfg.intrinsics, enable replacing fixnum-bit? with the %bit-test intrinsic:

: enable-bit-test ( -- )
    {
        { fixnum-bit? [ drop [ ^^bit-test ] binary-op ] }
    } enable-intrinsics ;

Disassemble

The old assembly using fixnum-bit? looked like this:

000000010ea00250: mov [rip-0x1ff256], eax
000000010ea00256: sub rsp, 0x8
000000010ea0025a: call 0x10eedf3b0 (integer>fixnum-strict)
000000010ea0025f: mov rax, [r14]
000000010ea00262: cmp rax, 0x0
000000010ea00266: jl 0x10ea002a7 (M\ fixnum bit? + 0x57)
000000010ea0026c: neg rax
000000010ea0026f: mov [r14], rax
000000010ea00272: call 0x10ebec060 (fixnum-shift)
000000010ea00277: mov rax, [r14]
000000010ea0027a: test rax, 0x10
000000010ea00281: mov rax, 0x1
000000010ea0028b: mov rbx, 0x1010eff4c
000000010ea00295: cmovnz rax, rbx
000000010ea00299: mov [r14], rax
000000010ea0029c: mov [rip-0x1ff2a2], eax
000000010ea002a2: add rsp, 0x8
000000010ea002a6: ret
000000010ea002a7: sub r14, 0x8
000000010ea002ab: mov qword [r14], 0x1
000000010ea002b2: mov [rip-0x1ff2b8], eax
000000010ea002b8: add rsp, 0x8
000000010ea002bc: ret

The new assembly looks like this with BT:

000000010e656ec0: mov [rip-0x293ec6], eax
000000010e656ec6: sub rsp, 0x8
000000010e656eca: call 0x10e467ea0 (integer>fixnum-strict)
000000010e656ecf: mov rax, [r14]
000000010e656ed2: mov rbx, [r14-0x8]
000000010e656ed6: sar rbx, 0x4
000000010e656eda: sar rax, 0x4
000000010e656ede: bt rbx, rax
000000010e656ee2: mov rbx, 0x1
000000010e656eec: mov rcx, 0x1018f169c
000000010e656ef6: cmovb rbx, rcx
000000010e656efa: sub r14, 0x8
000000010e656efe: mov [r14], rbx
000000010e656f01: mov [rip-0x293f07], eax
000000010e656f07: add rsp, 0x8
000000010e656f0b: ret

Performance

Turns out that this speeds up fixnum-bit? by a lot. Using a simple test case:

: bench-fixnum-bit ( x n -- ? )
    { fixnum fixnum } declare
    [ 0 100,000,000 ] 2dip
    '[ _ _ bit? [ 1 + ] when ] times ;

Our old version was decently fast:

IN: scratchpad gc [ 0b101010101 0 bench-fixnum-bit ] time
Running time: 0.821433838 seconds

But our new version is much faster!

IN: scratchpad gc [ 0b101010101 0 bench-fixnum-bit ] time
Running time: 0.239439108 seconds

This has been committed to the development version and is available now.

1 comment:

Peter said...

Hey John, I seem to have misplaced your email. Can you ping me at pwang [at] continuum.io? I'm actually in your neck of the woods for a few days, let me know if you want to grab a beer!