May 25, 2023, 12:54am
Linus Nordberg and I have had a paper accepted to FOCI 2023 on the
special pluggable transports configuration used on the Snowflake
bridges. That design was first hashed out on this mailing list last
[I'm about to go off-line for some days, so I am sending my current
suboptimally-organized reply, which I hope is better than waiting another
week to respond :)]
Let's make a distinction between the "frontend" snowflake-server
pluggable transport process, and the "backend" tor process. These don't
necessarily have to be 1:1; either one could be run in multiple
instances. Currently, the "backend" tor is the limiting factor, because
it uses only 1 CPU core. The "frontend" snowflake-server …
08:01PM - 08 Feb 22 UTC
This post is about running multiple tor processes on one bridge, for better scal
… ing on bridges that handle a lot of traffic. It is not a completely supported configuration, and requires a few workarounds. Most bridges do not need this. This setup is what is now running on the [Snowflake](https://snowflake.torproject.org/) bridge.
The [usual way](https://community.torproject.org/relay/setup/bridge/) to run a pluggable transport bridge is to run a single tor process, with the `ServerTransportPlugin` option set to the path of a pluggable transport executable. The tor process is responsible for running and managing the pluggable transport process. This is how we ran the Snowflake bridge until a few weeks ago. In Snowflake, the pluggable transport executable is [snowflake-server](https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/tree/main/server); it receives WebSocket connections from Snowflake proxies and forwards them to tor.
The number of Snowflake users rapidly increased after the [partial blocking of Tor in Russia](https://github.com/net4people/bbs/issues/97) in December 2021, which increased the load on the Snowflake bridge. Eventually it reached a point where the [tor process became a performance bottleneck](https://lists.torproject.org/pipermail/tor-relays/2021-December/020156.html). Because tor is [single-threaded](https://support.torproject.org/relay-operators/relay-bridge-overloaded/#tor-relay-load-onionskins-total-type-ntor-action-dropped-0), once it reaches 100% of one CPU, that's the limit. Adding more CPUs or increasing the speed of the network connection will not increase overall performance.
For [technical reasons](https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/28651) relating to Tor, it's not currently possible to run multiple independent bridges and, say, have Snowflake proxies choose one at random. The basic reason is that a Tor client expects to connect to a bridge with a certain [identity key](https://support.torproject.org/about/key-management/), and will cancel the connection if the key is not as expected.
We brainstormed options in a thread on the tor-relays mailing list:
The design we settled on is to run multiple tor processes (currently 4), all with the same identity key. They are technically distinct bridges, but they can all substitute for one another in terms of authenticating to clients. Instead of snowflake-server being run and managed by tor, it runs independently, as a normal system daemon managed by systemd. snowflake-server connects to the multiple instances of tor through a load balancer (we are using [HAProxy](https://www.haproxy.org/), though we also prototyped successfully with [Nginx](https://docs.nginx.com/nginx/admin-guide/load-balancer/tcp-udp-load-balancer/)). For the purposes of metrics, each instance of tor runs another component called extor-static-cookie, explained further below.
The whole configuration looks like this:
![Diagram of the load-balanced bridge configuration, showing snowflake-server, haproxy, and four instances of tor+extor-static-cookie](https://user-images.githubusercontent.com/41267675/153043952-8efd2e38-448b-4dda-862d-43d1f9bde081.png)
Detailed installation instructions:
There are [a couple of awkward details](https://lists.torproject.org/pipermail/tor-dev/2022-February/014695.html) to deal with. The first is onion key rotation. Besides its long-term identity key, each tor bridge has an [onion key](https://support.torproject.org/about/key-management/) that is used for circuit encryption. The onion key is changed every four weeks, so even if the multiple tor instances all start with the same onion keys, they will [eventually diverge](https://lists.torproject.org/pipermail/tor-relays/2022-January/020196.html). As a workaround, we set filesystem permissions to prevent tor from rewriting its onion key files. The second detail is ExtORPort authentication. [Extended ORPort (ExtORPort)](https://gitweb.torproject.org/torspec.git/tree/ext-orport-spec.txt?id=29245fd50d1ee3d96cca52154da4d888f34fedea#n145) is a protocol for attaching pluggable transport metadata to an incoming tor connection. It's the source of data for graphs like ["Bridge users by transport"](https://metrics.torproject.org/userstats-bridge-transport.html) and ["Bridge users by country"](https://metrics.torproject.org/userstats-bridge-country.html). The problem is that connecting to the ExtORPort requires [authenticating with a secret key](https://gitweb.torproject.org/torspec.git/tree/ext-orport-spec.txt?id=29245fd50d1ee3d96cca52154da4d888f34fedea#n62), and every instance of tor regenerates the key every time it is restarted. snowflake-server would not know which ExtORPort authentication key to use through the load balancer. Our workaround for this is a shim called [extor-static-cookie](https://lists.torproject.org/pipermail/tor-relays/2022-January/020183.html) that presents an ExtORPort with a shared, predictable authentication key to snowflake-server, then re-authenticates using the authentication key of its particular instance of tor.
Currently, on the Snowflake bridge, all the above components run on the same host. But the decoupling of tor and snowflake-server creates more options for future expansion. For example, it would be possible to run snowflake-server on one host, and all the instances of tor on another, nearby host. The next big hurdle will be when snowflake-server outgrows the resources of a single host, since it manages a lot of session state that is not trivial to distribute.
There is a draft of the paper here:
A question that more than one reviewer asked is, what are the security
implications of disabling onion key rotation as we do? (Section 3.2 in
the draft.) It's a good question and one we'd like to address in the
What are the risks of not rotating onion keys? My understanding is that
rotation is meant to enhance forward security; i.e., limit how far back
in time past recorded connections can be attacked in the case of key
compromise. https://spec.torproject.org/tor-design Section 4 says:
Short-term keys are rotated periodically and independently, to
limit the impact of key compromise.
Do the considerations differ when using ntor keys versus TAP keys?
tor-relays mailing list