Thursday, October 18, 2012

Parsing IPv4 Addresses

I came across an interesting question on StackOverflow asking about parsing IPv4 addresses. Typically, IPv4 addresses are specified with four components (e.g., something like 192.164.1.55). As the top answer points out, you might be suprised to see that ping interprets addresses a bit oddly:

C:\>ping 1
Pinging 0.0.0.1 with 32 bytes of data:

C:\>ping 1.2
Pinging 1.0.0.2 with 32 bytes of data:

C:\>ping 1.2.3
Pinging 1.2.0.3 with 32 bytes of data:

C:\>ping 1.2.3.4
Pinging 1.2.3.4 with 32 bytes of data:

C:\>ping 1.2.3.4.5
Ping request could not find host 1.2.3.4.5. Please check the name and try again.

C:\>ping 255
Pinging 0.0.0.255 with 32 bytes of data:

C:\>ping 256
Pinging 0.0.1.0 with 32 bytes of data:

In fact, you can reach google.com, using the IP address specified as dotted decimal (74.125.226.4), flat decimal (1249763844), dotted octal (0112.0175.0342.0004), flat octal (011237361004), dotted hex (0x4A.0x7D.0xE2.0x04), flat hex (0x4A7DE204), or even something of each (74.0175.0xe2.4).

Implementation

Of course, my first thought was that Factor should have a parser that works similarly (especially since I implemented support for the ping protocol awhile ago). We want a parse-ipv4 word taking a string representing the address and returning an IPv4 address string that has the typical four components.

First, we need to have words to split a string into numbered parts and a word to join them back together:

: split-components ( str -- array )
    "." split [ string>number ] map ;

: join-components ( array -- str )
    [ number>string ] map "." join ;

Then, we can parse the address simply:

: parse-ipv4 ( str -- ip )
    split-components dup length {
        { 1 [ { 0 0 0 } prepend ] }
        { 2 [ first2 [| A D | { A 0 0 D } ] call ] }
        { 3 [ first3 [| A B D | { A B 0 D } ] call ] }
        { 4 [ ] }
    } case join-components ;

Extras

If we want to support octal addresses, we can convert an octal number like 0112 to something Factor can easily parse (0o112) in our splitting code:

: cleanup-octal ( str -- str )
    dup { [ "0" head? ] [ "0x" head? not ] } 1&&
    [ 1 tail "0o" prepend ] when ;

: split-components ( str -- array )
    "." split [ cleanup-octal string>number ] map ;

And if we want to support the "carry propagation" which allows 256 to mean 0.0.1.0, we need to "bubble" the array before joining:

: bubble ( array -- array' )
    reverse 0 swap [ + 256 /mod ] map reverse nip ;

: join-components ( array -- str )
    bubble [ number>string ] map "." join ;

This (along with some error handling) has been committed to the Factor repository in the ip-parser vocabulary. If it proves useful, it might be nice to change the io.sockets to use this when resolving IPv4 addresses...

No comments: