[tor-relays] FreeBSD 13.1: clock_gettime(CLOCK_MONOTONIC_FAST) ~ 50 % performance gain

Hello everyone,

I was doing some profiling on my two relays running on FreeBSD 13.1
and noticed that they were spending a lot of time in clock_gettime()
which prompted me to have a look at the implementation.

Time implementation

···

===================

The time implementation is abstracted in src/lib/time/compat_time.c
where different mechanisms are used for different operating systems.
On Linux CLOCK_MONOTONIC_COARSE is a clock that gives worse precision
than CLOCK_MONOTONIC, but is faster and the abstraction layer checks
for its presense and provides more performat less precise time where
applicable.

On FreeBSD, there is also a fast monotonic time source available
called CLOCK_MONOTONIC_FAST. In the header file
src/lib/time/compat_time.h, a comment references this clock, but it is
not used. I thought it might be worth a shot seeing what difference it
would make if I enable the use of CLOCK_MONOTONIC_FAST on FreeBSD and
on the VM where I run my two FreeBSD relays, the difference was
stunning.

I made did a quick patch simply replacing CLOCK_MONOTONIC_COARSE with
CLOCK_MONOTONIC_FAST, see patches attached, compiled and tested.
Tracing system calls to make sure the correct call was being used,
which it was.

Results

This lead to reducing the CPU usage of the patched relay by about 50 %
compared to the unpatched relay. I was a bit shocked so I wrote a
small benchmark program and ran it on my VM giving the following
results:

CLOCK_MONOTONIC: 4.776675 s
CLOCK_MONOTONIC_FAST: 0.260002 s

Showing that on my VM the performance of CLOCK_MONOTONIC_FAST is about
20 times better than CLOCK_MONOTONIC.

I have tested on a few different systems and I think that the
performance increase of CLOCK_MONOTONIC_FAST is thanks to commit
60b0ad10dd0fc7ff6892ecc7ba3458482fcc064c - "vdso: lower precision of
vdso implementation of CLOCK_MONOTONIC_FAST and CLOCK_UPTIME_FAST"
that was cherry-picked to 13.1.

Try it yourself and report your results

If you want to benchmark your server to see whether switching clock
could benefit you, you can compile and run my attached test program by
doing

  >clang -o bench.c -o bench
  >./bench

In case the program terminates too quickly or slowly for your liking, adjust

  const unsigned long iterations = 1000000;

up or down to change the execution time.

My supplied patches appear to work fine on my system, but aren't
really upstream appropriate since a solution that works for both
FreeBSD and Linux is needed. If you want to test them and you're
building Tor from the ports tree, drop them in
/usr/ports/security/tor/files and build and install.

I'm very interested in seeing some performance data from other people
to see whether I think it worth either pestering some Tor devs to have
a look at this or putting in some effort myself to write an
upstreamable patch.

Thank you for reading!
Cordially,
Andreas Kempe

(Attachment bench.c is missing)

(Attachment patch-src_lib_time_compat__time.c is missing)

(Attachment patch-src_lib_time_compat__time.h is missing)

Excerpts from Andreas Kempe's message of June 21, 2022 11:50 am:

Hello everyone,

I was doing some profiling on my two relays running on FreeBSD 13.1
and noticed that they were spending a lot of time in clock_gettime()
which prompted me to have a look at the implementation.

Time implementation

The time implementation is abstracted in src/lib/time/compat_time.c
where different mechanisms are used for different operating systems.
On Linux CLOCK_MONOTONIC_COARSE is a clock that gives worse precision
than CLOCK_MONOTONIC, but is faster and the abstraction layer checks
for its presense and provides more performat less precise time where
applicable.

On FreeBSD, there is also a fast monotonic time source available
called CLOCK_MONOTONIC_FAST. In the header file
src/lib/time/compat_time.h, a comment references this clock, but it is
not used. I thought it might be worth a shot seeing what difference it
would make if I enable the use of CLOCK_MONOTONIC_FAST on FreeBSD and
on the VM where I run my two FreeBSD relays, the difference was
stunning.

I made did a quick patch simply replacing CLOCK_MONOTONIC_COARSE with
CLOCK_MONOTONIC_FAST, see patches attached, compiled and tested.
Tracing system calls to make sure the correct call was being used,
which it was.

According to clock_gettime,
FreeBSD 13.1 has CLOCK_MONOTONIC_COARSE, which it says is an alias of
CLOCK_MONOTONIC_FAST for compatibility with other systems. I suppose Tor
could add #if !defined(CLOCK_MONOTONIC_COARSE) &&
defined(CLOCK_MONOTONIC_FAST) #define CLOCK_MONOTONIC_COARSE
CLOCK_MONOTONIC_FAST, but I'm not sure how useful that would be. OpenBSD
and NetBSD don't seem to define either. Perhaps something like that
would be appropriate for a FreeBSD ports patch.

Cheers,
Alex.

···

_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

1 Like

Excerpts from Andreas Kempe's message of June 21, 2022 11:50 am:
> Hello everyone,
>
> I was doing some profiling on my two relays running on FreeBSD 13.1
> and noticed that they were spending a lot of time in clock_gettime()
> which prompted me to have a look at the implementation.
>
> Time implementation
> ===================
>
> The time implementation is abstracted in src/lib/time/compat_time.c
> where different mechanisms are used for different operating systems.
> On Linux CLOCK_MONOTONIC_COARSE is a clock that gives worse precision
> than CLOCK_MONOTONIC, but is faster and the abstraction layer checks
> for its presense and provides more performat less precise time where
> applicable.
>
> On FreeBSD, there is also a fast monotonic time source available
> called CLOCK_MONOTONIC_FAST. In the header file
> src/lib/time/compat_time.h, a comment references this clock, but it is
> not used. I thought it might be worth a shot seeing what difference it
> would make if I enable the use of CLOCK_MONOTONIC_FAST on FreeBSD and
> on the VM where I run my two FreeBSD relays, the difference was
> stunning.
>
> I made did a quick patch simply replacing CLOCK_MONOTONIC_COARSE with
> CLOCK_MONOTONIC_FAST, see patches attached, compiled and tested.
> Tracing system calls to make sure the correct call was being used,
> which it was.

According to clock_gettime,
FreeBSD 13.1 has CLOCK_MONOTONIC_COARSE, which it says is an alias of
CLOCK_MONOTONIC_FAST for compatibility with other systems.

Good catch! I happened to read the man page for clock_gettime() on a
FreeBSD 13.0 system (I was convinced was a 13.1 system) but was
checking the header file on a 13.1 system where I couldn't find
CLOCK_MONOTONIC_COARSE in the header file. A grep through /usr/include
shows it is actually hidden in another include.

With this being the case, this solves itself for FreeBSD 13.1. The
system I was patching Tor on was a 13.0 system, I was convinced I had
upgraded my VMs and never actually checked the version. 13.0 does not
have the optimisation commit I dug out, but FAST was still 20x faster.
I don't know if this is 13.0 specific, but since 13.0 is EoL soon, it
might not matter that much.

On other systems I benchmarked 12.3 did not show any noticeable
difference between the two, I could only see it for 13.1, but since
they do not have identical hardware, I don't if that could come into
play somehow.

I suppose Tor could add #if !defined(CLOCK_MONOTONIC_COARSE) &&
defined(CLOCK_MONOTONIC_FAST) #define CLOCK_MONOTONIC_COARSE
CLOCK_MONOTONIC_FAST, but I'm not sure how useful that would be.
OpenBSD and NetBSD don't seem to define either. Perhaps something
like that would be appropriate for a FreeBSD ports patch.

I was contemplating a solution similar to this one, but thought it
was ugly redefining a define so I used sed for my PoC to get a proper
overview of where the actual changes ended up in the code.

I unfortunately don't have any other BSD flavours running where I
could bench performance. If users of other BSD flavours have time to
run the benchmark, it would be interesting to see the results for
sure.

Cordially,
Andreas Kempe

···

On Tue, Jun 21, 2022 at 12:31:08PM -0400, Alex Xu (Hello71) via tor-relays wrote:

_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

1 Like

For completeness sake, I upgraded my VM to 13.1 and ran my benchmark
again. The slowdown of CLOCK_MONOTONIC compared to
CLOCK_MONOTONIC_FAST is now only about 3 times.

Compiling Tor unpatched now also works right out of the box and I'm
not seeing a storm of system calls leading me to wonder whether this
was some weird VDSO issue.

Cordially,
Andreas Kempe

···

On Tue, Jun 21, 2022 at 07:05:35PM +0200, Andreas Kempe wrote:

With this being the case, this solves itself for FreeBSD 13.1. The
system I was patching Tor on was a 13.0 system, I was convinced I had
upgraded my VMs and never actually checked the version. 13.0 does not
have the optimisation commit I dug out, but FAST was still 20x faster.
I don't know if this is 13.0 specific, but since 13.0 is EoL soon, it
might not matter that much.

On other systems I benchmarked 12.3 did not show any noticeable
difference between the two, I could only see it for 13.1, but since
they do not have identical hardware, I don't if that could come into
play somehow.

> I suppose Tor could add #if !defined(CLOCK_MONOTONIC_COARSE) &&
> defined(CLOCK_MONOTONIC_FAST) #define CLOCK_MONOTONIC_COARSE
> CLOCK_MONOTONIC_FAST, but I'm not sure how useful that would be.
> OpenBSD and NetBSD don't seem to define either. Perhaps something
> like that would be appropriate for a FreeBSD ports patch.
>

I was contemplating a solution similar to this one, but thought it
was ugly redefining a define so I used sed for my PoC to get a proper
overview of where the actual changes ended up in the code.

I unfortunately don't have any other BSD flavours running where I
could bench performance. If users of other BSD flavours have time to
run the benchmark, it would be interesting to see the results for
sure.

_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

1 Like

Thanks for looking into Tor performance on FreeBSD.

I'm seeing similar results on a physical ElectroBSD system
based on FreeBSD 13.1.

Some munin graphs and dmesg are available at:
<https://www.fabiankeil.de/blog-surrogat/2022/06/22/clock_gettime-patch-fuer-tor-auf-electrobsd-getestet.html&gt;

Fabian

···

Andreas Kempe <kempe@lysator.liu.se> wrote on 2022-06-21 at 19:56:45:

For completeness sake, I upgraded my VM to 13.1 and ran my benchmark
again. The slowdown of CLOCK_MONOTONIC compared to
CLOCK_MONOTONIC_FAST is now only about 3 times.

1 Like