Hello everyone,
I was doing some profiling on my two relays running on FreeBSD 13.1
and noticed that they were spending a lot of time in clock_gettime()
which prompted me to have a look at the implementation.
Time implementation
···
===================
The time implementation is abstracted in src/lib/time/compat_time.c
where different mechanisms are used for different operating systems.
On Linux CLOCK_MONOTONIC_COARSE is a clock that gives worse precision
than CLOCK_MONOTONIC, but is faster and the abstraction layer checks
for its presense and provides more performat less precise time where
applicable.
On FreeBSD, there is also a fast monotonic time source available
called CLOCK_MONOTONIC_FAST. In the header file
src/lib/time/compat_time.h, a comment references this clock, but it is
not used. I thought it might be worth a shot seeing what difference it
would make if I enable the use of CLOCK_MONOTONIC_FAST on FreeBSD and
on the VM where I run my two FreeBSD relays, the difference was
stunning.
I made did a quick patch simply replacing CLOCK_MONOTONIC_COARSE with
CLOCK_MONOTONIC_FAST, see patches attached, compiled and tested.
Tracing system calls to make sure the correct call was being used,
which it was.
Results
This lead to reducing the CPU usage of the patched relay by about 50 %
compared to the unpatched relay. I was a bit shocked so I wrote a
small benchmark program and ran it on my VM giving the following
results:
CLOCK_MONOTONIC: 4.776675 s
CLOCK_MONOTONIC_FAST: 0.260002 s
Showing that on my VM the performance of CLOCK_MONOTONIC_FAST is about
20 times better than CLOCK_MONOTONIC.
I have tested on a few different systems and I think that the
performance increase of CLOCK_MONOTONIC_FAST is thanks to commit
60b0ad10dd0fc7ff6892ecc7ba3458482fcc064c - "vdso: lower precision of
vdso implementation of CLOCK_MONOTONIC_FAST and CLOCK_UPTIME_FAST"
that was cherry-picked to 13.1.
Try it yourself and report your results
If you want to benchmark your server to see whether switching clock
could benefit you, you can compile and run my attached test program by
doing
>clang -o bench.c -o bench
>./bench
In case the program terminates too quickly or slowly for your liking, adjust
const unsigned long iterations = 1000000;
up or down to change the execution time.
My supplied patches appear to work fine on my system, but aren't
really upstream appropriate since a solution that works for both
FreeBSD and Linux is needed. If you want to test them and you're
building Tor from the ports tree, drop them in
/usr/ports/security/tor/files and build and install.
I'm very interested in seeing some performance data from other people
to see whether I think it worth either pestering some Tor devs to have
a look at this or putting in some effort myself to write an
upstreamable patch.
Thank you for reading!
Cordially,
Andreas Kempe
(Attachment bench.c is missing)
(Attachment patch-src_lib_time_compat__time.c is missing)
(Attachment patch-src_lib_time_compat__time.h is missing)