I was reading an article about micro optimizing int to IP address conversions. The author was trying to convert a 32-bit integer representation of an IP address into the more typical string representation using Common Lisp.
This is basically what the standard C library functions inet_ntoa and inet_aton do. I thought it might be fun to implement this in Factor and compare performance with the C versions.
alien
First, lets use the alien FFI vocabulary to allow the C functions to be called from Factor:
FUNCTION: c-string inet_ntoa ( uint32_t addr ) ; FUNCTION: int inet_aton ( c-string s, uint32_t *addr ) ;
We can call inet_ntoa
directly, but to call inet_aton
, we need a simple wrapper that calls it, preserves the result, and checks for success or failure:
: inet-aton ( x -- y ) { uint32_t } [ inet_aton 1 assert= ] with-out-parameters ;
We can test to see that it works:
IN: scratchpad 81952074 inet_ntoa . "74.125.226.4" IN: scratchpad "74.125.226.4" inet-aton . 81952074
Parsing 1 million integers with inet_ntoa
takes 1.016 seconds.
Parsing 1 million IP addresses with inet_aton
takes 0.626 seconds.
simple
Okay, what if we want to implement these ourselves?
Note: Unlike inet_ntoa
(which is in network byte order), we will assume little endian like the original author that spawned this adventure.
Converting an integer into an IP address string by taking each octet of the 32-bit number (based on the pseudocode used in the original article):
: ipv4-ntoa ( integer -- ip ) { 0x1000000 0x10000 0x100 0x1 } [ /i 8 bits number>string ] with map "." join ;
Converting an IP address into an integer is as easy as splitting on the dots and performing the reverse operation:
: ipv4-aton ( ip -- integer ) "." split [ string>number ] map { 0x1000000 0x10000 0x100 0x1 } v. ;
We can test that it works:
IN: scratchpad 1249763844 ipv4-ntoa . "74.125.226.4" IN: scratchpad "74.125.226.4" ipv4-aton . 1249763844
Parsing 1 million integers with ipv4-ntoa
takes 0.653 seconds.
Parsing 1 million IP addresses with ipv4-aton
takes 0.738 seconds.
faster
In the spirit of the original article, we will try some micro-optimizations (with some corresponding loss in readability) including type annotations.
Our versions use the generalized number parsing words, string>number and number>string. Specialized (less general) versions can give us additional performance:
TYPED: byte>string ( byte: fixnum -- str ) $[ 256 iota [ number>string ] map ] nth ; TYPED: string>byte ( str: string -- byte ) 0 [ [ 10 * ] dip CHAR: 0 - + ] reduce ;
We make a few other changes to use shifting and a slightly different approach:
TYPED: ipv4-ntoa2 ( integer: fixnum -- ip ) { -24 -16 -8 0 } [ 8 shift-mod byte>string ] with map "." join ; TYPED: ipv4-aton2 ( ip: string -- integer ) "." split { 24 16 8 0 } [ [ string>byte ] dip shift ] [ + ] 2map-reduce ;
Parsing 1 million integers with ipv4-ntoa2
takes 0.436 seconds!
Parsing 1 million IP addresses with ipv4-aton2
takes 0.496 seconds!
fastest
If we really want to do more micro-optimizations, and produce some ugly but fast code, then we can change the code to ensure more fixnum operations:
TYPED: byte>string2 ( byte: fixnum -- str ) $[ 256 iota [ number>string ] map ] nth-unsafe ; TYPED: string>byte2 ( str: string -- byte ) [ length iota 0 ] keep [ string-nth-fast [ 10 fixnum*fast ] dip CHAR: 0 fixnum-fast fixnum+fast ] curry reduce ; TYPED: ipv4-ntoa3 ( integer: fixnum -- ip ) $[ { -24 -16 -8 0 } [ [ 8 shift-mod ] curry ] map ] cleave [ byte>string2 ] 4 napply 4array "." join ; TYPED: ipv4-aton3 ( ip: string -- integer ) "." split first4 [ string>byte2 ] 4 napply [ 24 16 8 [ fixnum-shift-fast ] tri-curry@ tri* ] dip [ fixnum+fast ] tri@ ;
Parsing 1 million integers with ipv4-ntoa3
takes 0.285 seconds!
Parsing 1 million IP addresses with ipv4-aton3
takes 0.355 seconds!
I have committed more complete versions of ipv4-ntoa
and ipv4-aton
(with support for parsing IPv4 addresses) to the development version of Factor.
Does this mean even the simple version is faster than the Posix C-version?
ReplyDeleteOr is it wrapper cost/not implemented complexity, which would make it non-comparable?
There is a small amount of overhead for calling the C function, but not much. The simple Factor version does seem to be faster than my Posix C version. I haven't investigated it much to understand why, however.
ReplyDelete