Since recently my relay started to use more CPU resources than before, I decided to build binary, which is allowed to use all available instructions for my CPU. Especially, I was interested in enabling AVX2.
First of all, I tried to force chutney
to run on my Windows machine. After several hacks it started, but results were strange - ~2% of CPU use for single Tor process and 2.81 MBytes/s Single Stream Bandwidth - it is wrong for sure - speed was limited by something. I decided to not dig into it further and proceed without benchmarks.
I launched ./configure --help
and searched for useful options, but only thing I found was “influential environment variable” - CFLAGS
. Ok.
Usually I build tor with ./autogen.sh && ./configure --disable-unittests --disable-module-dirauth --disable-manpage --disable-html-manual --disable-asciidoc && make
, but this time I tried ./autogen.sh && CFLAGS="-march=haswell" ./configure --disable-unittests --disable-module-dirauth --disable-manpage --disable-html-manual --disable-asciidoc && make
.
Results were suspicious: binary shrinked from 14 MB to 5 MB. It is not what I was expecting when changed instruction sets. Next surprise was increased CPU load. After looking at disassembly, I found that AVX2 was actually activated, but also binary was built with wrong curve25519_donna
implementation.
By comparing logs from different builds I found that -march=haswell
for some reason deactivated -g
and -O2
flags. Ok. Next attempt used this line: ./autogen.sh && CFLAGS="-march=haswell -g -O2" ./configure --disable-unittests --disable-module-dirauth --disable-manpage --disable-html-manual --disable-asciidoc && make
. This time binary came back to its 14MB size, curve25519_donna
was built correctly and AVX2 was enabled. Looks good. CPU usage for Tor process look normal.
But I’m not sure if I should be satisfied with result. 1. Did it become actually faster? 2. Did I switched AVX2 optimizations on correctly or I broke something (like happened for curve25519_donna
for the first try).
Can anyone say what is the correct method of enabling AVX2 for Tor? And do anyone know if it makes Tor faster or slower?
upd. I made mistake in first version of this topic: binary produced with -g -O2 -march=haswell
did not used AVX2. I rebuilt it with -march=haswell -g -O2
and now it have AVX2. GCC is very strange software.