Sunday, February 9, 2014

inet_ntoa and inet_aton

I was reading an article about micro optimizing int to IP address conversions. The author was trying to convert a 32-bit integer representation of an IP address into the more typical string representation using Common Lisp.

This is basically what the standard C library functions inet_ntoa and inet_aton do. I thought it might be fun to implement this in Factor and compare performance with the C versions.

alien

First, lets use the alien FFI vocabulary to allow the C functions to be called from Factor:

FUNCTION: c-string inet_ntoa ( uint32_t addr ) ;

FUNCTION: int inet_aton ( c-string s, uint32_t *addr ) ;

We can call inet_ntoa directly, but to call inet_aton, we need a simple wrapper that calls it, preserves the result, and checks for success or failure:

: inet-aton ( x -- y )
    { uint32_t } [ inet_aton 1 assert= ] with-out-parameters ;

We can test to see that it works:

IN: scratchpad 81952074 inet_ntoa .
"74.125.226.4"

IN: scratchpad "74.125.226.4" inet-aton .
81952074

Parsing 1 million integers with inet_ntoa takes 1.016 seconds.

Parsing 1 million IP addresses with inet_aton takes 0.626 seconds.

simple

Okay, what if we want to implement these ourselves?

Note: Unlike inet_ntoa (which is in network byte order), we will assume little endian like the original author that spawned this adventure.

Converting an integer into an IP address string by taking each octet of the 32-bit number (based on the pseudocode used in the original article):

: ipv4-ntoa ( integer -- ip )
    { 0x1000000 0x10000 0x100 0x1 }
    [ /i 8 bits number>string ] with map "." join ;

Converting an IP address into an integer is as easy as splitting on the dots and performing the reverse operation:

: ipv4-aton ( ip -- integer )
    "." split [ string>number ] map
    { 0x1000000 0x10000 0x100 0x1 } v. ;

We can test that it works:

IN: scratchpad 1249763844 ipv4-ntoa .
"74.125.226.4"

IN: scratchpad "74.125.226.4" ipv4-aton .
1249763844

Parsing 1 million integers with ipv4-ntoa takes 0.653 seconds.

Parsing 1 million IP addresses with ipv4-aton takes 0.738 seconds.

faster

In the spirit of the original article, we will try some micro-optimizations (with some corresponding loss in readability) including type annotations.

Our versions use the generalized number parsing words, string>number and number>string. Specialized (less general) versions can give us additional performance:

TYPED: byte>string ( byte: fixnum -- str )
    $[ 256 iota [ number>string ] map ] nth ;

TYPED: string>byte ( str: string -- byte )
    0 [ [ 10 * ] dip CHAR: 0 - + ] reduce ;

We make a few other changes to use shifting and a slightly different approach:

TYPED: ipv4-ntoa2 ( integer: fixnum -- ip )
    { -24 -16 -8 0 } [ 8 shift-mod byte>string ] with map
    "." join ;

TYPED: ipv4-aton2 ( ip: string -- integer )
    "." split { 24 16 8 0 }
    [ [ string>byte ] dip shift ] [ + ] 2map-reduce ;

Parsing 1 million integers with ipv4-ntoa2 takes 0.436 seconds!

Parsing 1 million IP addresses with ipv4-aton2 takes 0.496 seconds!

fastest

If we really want to do more micro-optimizations, and produce some ugly but fast code, then we can change the code to ensure more fixnum operations:

TYPED: byte>string2 ( byte: fixnum -- str )
    $[ 256 iota [ number>string ] map ] nth-unsafe ;

TYPED: string>byte2 ( str: string -- byte )
    [ length iota 0 ] keep [
        string-nth-fast
        [ 10 fixnum*fast ] dip CHAR: 0 fixnum-fast fixnum+fast
    ] curry reduce ;

TYPED: ipv4-ntoa3 ( integer: fixnum -- ip )
    $[ { -24 -16 -8 0 } [ [ 8 shift-mod ] curry ] map ] cleave
    [ byte>string2 ] 4 napply 4array "." join ;

TYPED: ipv4-aton3 ( ip: string -- integer )
    "." split first4 [ string>byte2 ] 4 napply
    [ 24 16 8 [ fixnum-shift-fast ] tri-curry@ tri* ] dip
    [ fixnum+fast ] tri@ ;

Parsing 1 million integers with ipv4-ntoa3 takes 0.285 seconds!

Parsing 1 million IP addresses with ipv4-aton3 takes 0.355 seconds!

I have committed more complete versions of ipv4-ntoa and ipv4-aton (with support for parsing IPv4 addresses) to the development version of Factor.

2 comments:

Profpatsch said...

Does this mean even the simple version is faster than the Posix C-version?

Or is it wrapper cost/not implemented complexity, which would make it non-comparable?

mrjbq7 said...

There is a small amount of overhead for calling the C function, but not much. The simple Factor version does seem to be faster than my Posix C version. I haven't investigated it much to understand why, however.