Sunday, February 12, 2023

Magic

Ever wonder what the type of a particular binary file is? Or wonder how a program knows that a particular binary file is in a compatible file format? One way is to look at the magic number used by the file format in question. You can see some examples in a list of file signatures.

The libmagic library commonly supports the file command on Unix systems, other than Apple macOS which has its own implementation, and uses magic numbers and other techniques to identify file types. You can see how it works through a few examples:

$ file vm/factor.hpp
vm/factor.hpp: C++ source text, ASCII text

$ file Factor.app/Contents/Info.plist 
Factor.app/Contents/Info.plist: XML  document text

$ file factor
factor: Mach-O 64-bit executable x86_64

$ file factor.image
factor.image: data

Wrapping the C library

I am going to show how to wrap a C library using the alien vocabulary which provides an FFI capability in Factor. The man pages for libmagic show us some of the functions available in magic.h.

The libmagic library needs to be made available to the Factor instance:

"magic" {
    { [ os macosx? ] [ "libmagic.dylib" ] }
    { [ os unix? ] [ "libmagic.so" ] }
} cond cdecl add-library 

We start by defining an opaque type for magic_t:

TYPEDEF: void* magic_t

Some functions are available for opening, loading, and then closing the magic_t:

FUNCTION: magic_t magic_open ( int flags )

FUNCTION: int magic_load ( magic_t magic, c-string path )

FUNCTION: void magic_close ( magic_t magic )

It is convenient to wrap the close function as a destructor for use in a with-destructors form.

DESTRUCTOR: magic_close

A function that "returns a textual description of the contents of the filename argument", which gives us the file command ability above:

FUNCTION: c-string magic_file ( magic_t magic, c-string path )

That should be everything we need to continue...

Using the C library

Now that we have the raw C library made available as Factor words, we can create a simpler interface by wrapping some of the words into a simple word that guesses the type of a file:

: guess-file ( path -- result )
    [
        normalize-path
        0 magic_open &magic_close
        [ f magic_load drop ]
        [ swap magic_file ] bi
    ] with-destructors ;

And we can then try it on a few files:

IN: scratchpad "vm/factor.hpp" guess-file .
"C++ source, ASCII text"

IN: scratchpad "Factor.app/Contents/Info.plist" guess-file .
"XML 1.0 document, Unicode text, UTF-8 text"

IN: scratchpad "factor" guess-file .
"symbolic link to Factor.app/Contents/MacOS/factor"

IN: scratchpad "factor.image" guess-file .
"data"

This has been available for awhile in the magic vocabulary with improved error checking and some options to guess the MIME type of files.

No comments: