Sometimes profiling an application will show hotspots in the oddest places. For some types of network applications that process huge volumes of events, calls to
gettimeofday() can become a bottleneck. If each event needs to have a timestamp generated, this could mean thousands of system calls in a very short time, all producing essentially the same value. If the "actual" time is not so important, performance could be gained by relaxing the requirement that all events received in a
select() loop have the same timestamp.
One way to do this is to cache the timestamp after waking up, and use that timestamp for all events received within the I/O loop. Unfortunately, the I/O loop can take a long time to process - resulting in timestamps that diverge from "actual" time by small or large (and unpredictable) amounts. Perhaps a better way would be to cache the result of
gettimeofday() with a resolution of one millisecond.
gettimeofday() is called by the
now word from the
calendar vocabulary. Let's try and build a
fast-now word that adds caching:
USING: calendar kernel math namespaces system ;
Our cache resolution is one millisecond (or 1,000,000 nanoseconds):
CONSTANT: cache-duration 1000000
We will be keeping the cached value and the expiration time:
SYMBOL: cache-value SYMBOL: cache-until
Given the current time in nanoseconds, we can check to see if the cached value has expired:
: cache-expired? ( nanos -- ? ) cache-until get-global 0 or > ;
If it has, we can reset the cache expiration:
: reset-cache ( nanos -- ) cache-duration + cache-until set-global ;
And update the cached value (the result of calling
: update-cache ( nanos -- timestamp ) reset-cache now [ cache-value set-global ] keep ;
fast-now word is as easy as:
- Get the current monotonically increasing
- Check if the cache has expired.
- Update the cache if expired, otherwise return the cached value.
: fast-now ( -- timestamp ) nano-count dup cache-expired? [ update-cache ] [ drop cache-value get-global ] if ;
Note: we use a monotonic timer because it is a much faster operation than calling
gettimeofday(). Another fast way would be to use the instruction counter, however we would need to estimate cpu speed to know how many instructions the cached value should survive for.
We can try this out, and see how much faster it is:
( scratchpad ) [ 1000 [ now drop ] times ] benchmark . 19706313 ( scratchpad ) [ 1000 [ fast-now drop ] times ] benchmark . 356574
So, 1,000 calls to
now took 19.706 milliseconds, and only 0.356 milliseconds for
fast-now (for a speedup of 55x)! In the case of a single call, both
fast-now take roughly the same time. Not a bad improvement, right?
The code (and some tests) for this is on my Github.