Re: [tor-dev] Two features that would help load-balanced bridges

Linus Nordberg and I will have a paper at FOCI this summer on the
special way we run tor on the Snowflake bridges to permit better
scaling. It discusses the two workarounds from the post below, namely a
shim for predictable ExtORPort auth, and disabling onion key rotation.
This setup has been in place on Snowflake bridges since January 2022.
About 2.5% of Tor users (all users, not just bridge users) access Tor
using Snowflake, so it's not a niche use case even if it's just us.

One of the reviewers asked if there was a chance changes might be made
in tor that make our workarounds unnecessary. Is there anything to say
to this question? Might tor get a feature to control ExtORPort
authentication or onion key rotation, or is something that's planned to
stay as it is in favor of Arti? (Arti will probably remove the need for
the load-balanced multi-tor configuration, which will also remove the
need to disable onion key rotation, though better control over ExtORPort
auth could still be useful for running server PTs that are not child
processes of arti.)

Here is a draft of the paper, the relevant Sections are 3.1 and 3.2.


On Mon, Feb 07, 2022 at 07:26:37PM -0700, David Fifield wrote:

After the blocking of Tor in Russia in December 2022, the number of
Snowflake users rapidly increased. Eventually the tor process became the
limiting factor for performance, using all of one CPU core.

In a thread on tor-relays, we worked out a design where we run multiple
instances of tor on the same host, all with the same identity keys, in
order to effectively use all the server's CPU resources. It's running on
the live bridge now, and as a result the bridge's bandwidth use has
roughly doubled.

Design thread
  [tor-relays] How to reduce tor CPU load on a single bridge?
Installation instructions
  Snowflake Bridge Installation Guide · Wiki · The Tor Project / Anti-censorship / Team · GitLab

Two details came up that are awkward to deal with. We have workaround
for them, but they could benefit from support from core tor. They are:

1. Provide a way to disable onion key rotation, or configure a custom
   onion key.
2. Provide a way to set a specific authentication cookie for ExtORPort
   SAFE_COOKIE authentication, or a new authentication type that doesn't
   require credentials that change whenever tor is restarted.

I should mention that, apart from the load-balancing design we settled
on, we have brainstormed some other options for scaling the Snowflake
bridge or bridges. At this point, none of these ideas can immediately be
put into practice, because there's no way to tell tor "connect to one of
these bridges at random, but only one," or "connect to this bridge, but
accept any of these fingerprints."
Prepare all pieces of the snowflake pipeline for a second snowflake bridge (#28651) · Issues · The Tor Project / Anti-censorship / Pluggable Transports / Snowflake · GitLab

# Disable onion key rotation

Multiple tor instances with the same identity keys will work fine for
the first 5 weeks (onion-key-rotation-days + onion-key-grace-period-days),
but after that time the instances will have independently rotated their
onion keys, and clients will have connection failures unless the load
balancer happens to connect them to the instance whose descriptor they
have cached. This post investigates what the failure looks like:
[tor-relays] How to reduce tor CPU load on a single bridge?

Examples of what could work here are a torrc option to set
onion-key-rotation-days to a large value, an option to disable onion key
rotation, an option to set a certain named file as the onion key.

What we are doing now is a bit of a nasty hack: we create a directory
named secret_onion_key.old, so that a failed replace_file causes an
early exit from rotate_onion_key.
router.c\relay\feature\src - tor - Tor's source code
There are a few apparently benign side effects, like tor trying to
rebuild its descriptor every hour, but it's effective at stopping onion
key rotation.
[tor-relays] How to reduce tor CPU load on a single bridge?

# Stable ExtORPort authentication

ExtORPort (extended ORPort) is a protocol that lets a pluggable
transport attach transport and client IP metadata to a connection, for
metrics purposes. In order to connect to the ExtORPort, the pluggable
transport needs to authenticate using a scheme like ControlPort
217-ext-orport-auth.txt\proposals - torspec - Tor's protocol specifications
tor generates a secret auth cookie and stores it in a file. When the
pluggable transport process is managed by tor, tor tells the pluggable
transport where to find the file by setting the TOR_PT_AUTH_COOKIE_FILE
environment variable.

In the load-balanced configuration, the pluggable transport server
(snowflake-server) is not run and managed by tor. It is an independent
daemon, so it doesn't have access to TOR_PT_AUTH_COOKIE_FILE (which
anyway would be a different path for every tor instance). The bigger
problem is that tor regenerates the auth cookie and rewrites the file on
every restart. All the tor instances have different cookies, and
snowflake-server does not know which it will get through the load
balancer, so it doesn't know what cookie to use.

Examples of what would work here are an option to use a certain file as
the auth cookie, an option to leave the auth cookie file alone if it
already exists, or a new ExtORPort authentication type that can use the
same credentials across multiple instances.

What we're doing now is using a shim program, extor-static-cookie, which
presents an ExtORPort interface with a static auth cookie for
snowflake-server to authenticate with, then re-authenticates to the
ExtORPort of its respective instance of tor, using that instance's auth
[tor-relays] How to reduce tor CPU load on a single bridge?

tor-dev mailing list

1 Like