Sometimes profiling an application will show hotspots in the oddest places. For some types of network applications that process huge volumes of events, calls to gettimeofday()
can become a bottleneck. If each event needs to have a timestamp generated, this could mean thousands of system calls in a very short time, all producing essentially the same value. If the "actual" time is not so important, performance could be gained by relaxing the requirement that all events received in a select()
loop have the same timestamp.
One way to do this is to cache the timestamp after waking up, and use that timestamp for all events received within the I/O loop. Unfortunately, the I/O loop can take a long time to process - resulting in timestamps that diverge from "actual" time by small or large (and unpredictable) amounts. Perhaps a better way would be to cache the result of gettimeofday()
with a resolution of one millisecond.
fast-now
In Factor, gettimeofday()
is called by the now
word from the calendar
vocabulary. Let's try and build a fast-now
word that adds caching:
USING: calendar kernel math namespaces system ;
Our cache resolution is one millisecond (or 1,000,000 nanoseconds):
CONSTANT: cache-duration 1000000
We will be keeping the cached value and the expiration time:
SYMBOL: cache-value SYMBOL: cache-until
Given the current time in nanoseconds, we can check to see if the cached value has expired:
: cache-expired? ( nanos -- ? ) cache-until get-global 0 or > ;
If it has, we can reset the cache expiration:
: reset-cache ( nanos -- ) cache-duration + cache-until set-global ;
And update the cached value (the result of calling now
):
: update-cache ( nanos -- timestamp ) reset-cache now [ cache-value set-global ] keep ;
Building the fast-now
word is as easy as:
- Get the current monotonically increasing
nano-count
. - Check if the cache has expired.
- Update the cache if expired, otherwise return the cached value.
: fast-now ( -- timestamp ) nano-count dup cache-expired? [ update-cache ] [ drop cache-value get-global ] if ;
Note: we use a monotonic timer because it is a much faster operation than calling gettimeofday()
. Another fast way would be to use the instruction counter, however we would need to estimate cpu speed to know how many instructions the cached value should survive for.
Try it
We can try this out, and see how much faster it is:
( scratchpad ) [ 1000 [ now drop ] times ] benchmark . 19706313 ( scratchpad ) [ 1000 [ fast-now drop ] times ] benchmark . 356574
So, 1,000 calls to now
took 19.706 milliseconds, and only 0.356 milliseconds for fast-now
(for a speedup of 55x)! In the case of a single call, both now
and fast-now
take roughly the same time. Not a bad improvement, right?
The code (and some tests) for this is on my Github.
No comments:
Post a Comment