[tor-relays] Security implications of disabling onion key rotation?

dcf · May 25, 2023, 12:54am

Linus Nordberg and I have had a paper accepted to FOCI 2023 on the
special pluggable transports configuration used on the Snowflake
bridges. That design was first hashed out on this mailing list last
year.

github.com/net4people/bbs

Running a load-balanced Tor bridge

opened 08:01PM - 08 Feb 22 UTC

wkrp

This post is about running multiple tor processes on one bridge, for better scal…ing on bridges that handle a lot of traffic. It is not a completely supported configuration, and requires a few workarounds. Most bridges do not need this. This setup is what is now running on the [Snowflake](https://snowflake.torproject.org/) bridge. The [usual way](https://community.torproject.org/relay/setup/bridge/) to run a pluggable transport bridge is to run a single tor process, with the `ServerTransportPlugin` option set to the path of a pluggable transport executable. The tor process is responsible for running and managing the pluggable transport process. This is how we ran the Snowflake bridge until a few weeks ago. In Snowflake, the pluggable transport executable is [snowflake-server](https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/tree/main/server); it receives WebSocket connections from Snowflake proxies and forwards them to tor. The number of Snowflake users rapidly increased after the [partial blocking of Tor in Russia](https://github.com/net4people/bbs/issues/97) in December 2021, which increased the load on the Snowflake bridge. Eventually it reached a point where the [tor process became a performance bottleneck](https://lists.torproject.org/pipermail/tor-relays/2021-December/020156.html). Because tor is [single-threaded](https://support.torproject.org/relay-operators/relay-bridge-overloaded/#tor-relay-load-onionskins-total-type-ntor-action-dropped-0), once it reaches 100% of one CPU, that's the limit. Adding more CPUs or increasing the speed of the network connection will not increase overall performance. For [technical reasons](https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/28651) relating to Tor, it's not currently possible to run multiple independent bridges and, say, have Snowflake proxies choose one at random. The basic reason is that a Tor client expects to connect to a bridge with a certain [identity key](https://support.torproject.org/about/key-management/), and will cancel the connection if the key is not as expected. We brainstormed options in a thread on the tor-relays mailing list: https://forum.torproject.net/t/tor-relays-how-to-reduce-tor-cpu-load-on-a-single-bridge/1483 The design we settled on is to run multiple tor processes (currently 4), all with the same identity key. They are technically distinct bridges, but they can all substitute for one another in terms of authenticating to clients. Instead of snowflake-server being run and managed by tor, it runs independently, as a normal system daemon managed by systemd. snowflake-server connects to the multiple instances of tor through a load balancer (we are using [HAProxy](https://www.haproxy.org/), though we also prototyped successfully with [Nginx](https://docs.nginx.com/nginx/admin-guide/load-balancer/tcp-udp-load-balancer/)). For the purposes of metrics, each instance of tor runs another component called extor-static-cookie, explained further below. The whole configuration looks like this: ![Diagram of the load-balanced bridge configuration, showing snowflake-server, haproxy, and four instances of tor+extor-static-cookie](https://user-images.githubusercontent.com/41267675/153043952-8efd2e38-448b-4dda-862d-43d1f9bde081.png) Detailed installation instructions: https://gitlab.torproject.org/tpo/anti-censorship/team/-/wikis/Survival-Guides/Snowflake-Bridge-Installation-Guide?version_id=6de6facbb0fd047de978a561213c59224511445f There are [a couple of awkward details](https://lists.torproject.org/pipermail/tor-dev/2022-February/014695.html) to deal with. The first is onion key rotation. Besides its long-term identity key, each tor bridge has an [onion key](https://support.torproject.org/about/key-management/) that is used for circuit encryption. The onion key is changed every four weeks, so even if the multiple tor instances all start with the same onion keys, they will [eventually diverge](https://lists.torproject.org/pipermail/tor-relays/2022-January/020196.html). As a workaround, we set filesystem permissions to prevent tor from rewriting its onion key files. The second detail is ExtORPort authentication. [Extended ORPort (ExtORPort)](https://gitweb.torproject.org/torspec.git/tree/ext-orport-spec.txt?id=29245fd50d1ee3d96cca52154da4d888f34fedea#n145) is a protocol for attaching pluggable transport metadata to an incoming tor connection. It's the source of data for graphs like ["Bridge users by transport"](https://metrics.torproject.org/userstats-bridge-transport.html) and ["Bridge users by country"](https://metrics.torproject.org/userstats-bridge-country.html). The problem is that connecting to the ExtORPort requires [authenticating with a secret key](https://gitweb.torproject.org/torspec.git/tree/ext-orport-spec.txt?id=29245fd50d1ee3d96cca52154da4d888f34fedea#n62), and every instance of tor regenerates the key every time it is restarted. snowflake-server would not know which ExtORPort authentication key to use through the load balancer. Our workaround for this is a shim called [extor-static-cookie](https://lists.torproject.org/pipermail/tor-relays/2022-January/020183.html) that presents an ExtORPort with a shared, predictable authentication key to snowflake-server, then re-authenticates using the authentication key of its particular instance of tor. Currently, on the Snowflake bridge, all the above components run on the same host. But the decoupling of tor and snowflake-server creates more options for future expansion. For example, it would be possible to run snowflake-server on one host, and all the instances of tor on another, nearby host. The next big hurdle will be when snowflake-server outgrows the resources of a single host, since it manages a lot of session state that is not trivial to distribute.

There is a draft of the paper here:

https://www.bamsoftware.com/papers/pt-bridge-hiperf/pt-bridge-hiperf.20230307.tex

A question that more than one reviewer asked is, what are the security
implications of disabling onion key rotation as we do? (Section 3.2 in
the draft.) It's a good question and one we'd like to address in the
final draft.

What are the risks of not rotating onion keys? My understanding is that
rotation is meant to enhance forward security; i.e., limit how far back
in time past recorded connections can be attacked in the case of key
compromise. Redirecting... Section 4 says:
Short-term keys are rotated periodically and independently, to
limit the impact of key compromise.
Do the considerations differ when using ntor keys versus TAP keys?

···

_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

nickm · June 1, 2023, 1:07pm

[...]

What are the risks of not rotating onion keys? My understanding is that
rotation is meant to enhance forward security; i.e., limit how far back
in time past recorded connections can be attacked in the case of key
compromise. Redirecting... Section 4 says:
Short-term keys are rotated periodically and independently, to
limit the impact of key compromise.

This is an interesting question!

So, compromising an onion key shouldn't be enough on its own to break
forward secrecy. The circuit extension handshakes use an additional
set of ephemeral keys as part of the negotiation process, which are
discarded immediately after the handshake. (This is the
diffie-hellman keys in TAP, and the x/X y/Y keypairs in ntor.)
Assuming that this is done properly, and all the cryptographic
assumptions hold, these keys alone should make it impossible to
decrypt anything after the session keys are discarded.

The purpose of the onion key is, rather, to make it impossible for
somebody else to impersonate the target relay. If somebody steals
your onion key, and they have their own relay R, then they can use
your onion key to impersonate you whenever somebody tries to extend a
circuit from R to you.

Onion key rotation limits the time range in which this kind of attack
is useful: it will only work for as long as the onion key is listed in
a live directory.

(Now, any attacker who can steal your onion key can probably also
steal your identity key too, if you don't keep that offline, and use
it to impersonate you for even longer. The advantage of using a stolen
onion key is that it's much harder to detect; all the attacks I can
think of that use a stolen identity key involve, whereas the
onion-key-theft attack occurs when you are already in a perfect
position to be a MITM.)

Do the considerations differ when using ntor keys versus TAP keys?

The argument above is the same with TAP and ntor, I'd say, except for
the fact that TAP just isn't that secure under modern assumptions: it
depends on RSA-1024 and DH-1024, both of which are believed to
breakable if you have a large budget or a lot of stolen computers or a
lot of time.

Assuming that we care about these attacks, they _would_ make rotating
TAP keys more important: the longer any TAP onion keys are in use, the
more cost-effective it would be for an attacker to factor them...

...but there's another factor that makes TAP keys less important: they
simply shouldn't be used for anything modern in today's Tor. The last
thing that required the TAP handshake was some facets of the v2 onion
service protocol, and that's now been fully deprecated. So I wouldn't
personally worry about TAP too much.

hoping this helps and I haven't screwed up my analysis,

···

On Wed, May 24, 2023 at 8:54 PM David Fifield <david@bamsoftware.com> wrote:
--
Nick
_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

arma · June 1, 2023, 5:21pm

Thanks Nick! I endorse Nick's response, with two additions:

Onion key rotation limits the time range in which this kind of attack
is useful: it will only work for as long as the onion key is listed in
a live directory.

For bridges it is a little bit different, because bridges don't have
an onion key listed in any public (consensus style) directory document
that clients get. Rather, the client connects to the bridge directly and
fetches a full timestamped descriptor from the bridge, which is signed
by the bridge's identity key, and which includes the onion key that the
client should use.

So if you have broken an old (rotated) onion key for a bridge, the
proper attack involves MITMing the connection to the bridge, breaking
or stealing the bridge's identity key too, and crafting a new descriptor
that lists the old onion key.

Whereas if the bridge never rotates the onion key, then you would be
able to successfully attack the CREATE cell that the client sends to
the bridge -- but only if you could see it, which would involve MITMing
the connection to the bridge and also being able to convince the client
that you are the bridge, which I think implies having or breaking the
identity key too. Doesn't seem so bad.

(Now, any attacker who can steal your onion key can probably also
steal your identity key too, if you don't keep that offline, and use
it to impersonate you for even longer. The advantage of using a stolen
onion key is that it's much harder to detect; all the attacks I can
think of that use a stolen identity key involve, whereas the
onion-key-theft attack occurs when you are already in a perfect
position to be a MITM.)

"...involve publishing a new signed document which others could notice"
maybe?

Though for the bridge case, the attack could be more subtle, in that you
could provide a specially signed descriptor only to your victim user,
who would then learn the special onion key from that descriptor, use it,
and never know that other users received a different descriptor.

An attack like that isn't so bad though, because we still have the second
hop and third hop in the circuit, producing their own forward-secret
session keys with their own properly rotated onion keys, and having the
protections that Nick describes.

--Roger

···

On Thu, Jun 01, 2023 at 09:07:17AM -0400, Nick Mathewson wrote:

_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

dcf · June 28, 2023, 8:02am

Thanks, that helps. If I understand correctly, compromise of an onion
key allows an attacker to impersonate the relay because it is
effectively the relay's "identity" as far as CREATE cells and
circuit_send_first_onion_skin are concerned; i.e., the public onion key
is the "B" in the ntor handshake, in which the relay's actual long-term
identity key doesn't play a role. The only way the identity keys figure
into it is that they (via the signing keys) sign the consensus documents
that inform clients what onion keys to expect.

The way I'm planning to summarize this is that, with onion key rotation
disabled, you need to treat the now long-term onion keys as if they were
long-term identity keys.

···

On Thu, Jun 01, 2023 at 09:07:17AM -0400, Nick Mathewson wrote:

On Wed, May 24, 2023 at 8:54 PM David Fifield <david@bamsoftware.com> wrote:
[...]
>
> What are the risks of not rotating onion keys? My understanding is that
> rotation is meant to enhance forward security; i.e., limit how far back
> in time past recorded connections can be attacked in the case of key
> compromise. Redirecting... Section 4 says:
> Short-term keys are rotated periodically and independently, to
> limit the impact of key compromise.

This is an interesting question!

So, compromising an onion key shouldn't be enough on its own to break
forward secrecy. The circuit extension handshakes use an additional
set of ephemeral keys as part of the negotiation process, which are
discarded immediately after the handshake. (This is the
diffie-hellman keys in TAP, and the x/X y/Y keypairs in ntor.)
Assuming that this is done properly, and all the cryptographic
assumptions hold, these keys alone should make it impossible to
decrypt anything after the session keys are discarded.

The purpose of the onion key is, rather, to make it impossible for
somebody else to impersonate the target relay. If somebody steals
your onion key, and they have their own relay R, then they can use
your onion key to impersonate you whenever somebody tries to extend a
circuit from R to you.

Onion key rotation limits the time range in which this kind of attack
is useful: it will only work for as long as the onion key is listed in
a live directory.

(Now, any attacker who can steal your onion key can probably also
steal your identity key too, if you don't keep that offline, and use
it to impersonate you for even longer. The advantage of using a stolen
onion key is that it's much harder to detect; all the attacks I can
think of that use a stolen identity key involve, whereas the
onion-key-theft attack occurs when you are already in a perfect
position to be a MITM.)

_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

dcf · June 28, 2023, 8:09am

Thanks Nick! I endorse Nick's response, with two additions:

> Onion key rotation limits the time range in which this kind of attack
> is useful: it will only work for as long as the onion key is listed in
> a live directory.

For bridges it is a little bit different, because bridges don't have
an onion key listed in any public (consensus style) directory document
that clients get. Rather, the client connects to the bridge directly and
fetches a full timestamped descriptor from the bridge, which is signed
by the bridge's identity key, and which includes the onion key that the
client should use.

Thanks, that was a subtlety I had missed. Since we are writing about
bridges, I mostly want to give the bridge perspective. We had formerly
written this:
  A relay's current onion keys appear in the Tor network
  consensus; when clients make circuits through it, they expect it
  to use certain onion keys.
We've now changed it to:
  Tor clients cache a bridge's onion public keys when they
  connect; subsequent connections only work if the cached keys are
  among the bridge's two most recently used sets of onion keys.

Here's my old post when I tested what would happen if a client cached
one onion key on the first attempt and then the onion key was not the
same on the second attempt:
https://lists.torproject.org/pipermail/tor-relays/2022-January/020238.html

So if you have broken an old (rotated) onion key for a bridge, the
proper attack involves MITMing the connection to the bridge, breaking
or stealing the bridge's identity key too, and crafting a new descriptor
that lists the old onion key.

Whereas if the bridge never rotates the onion key, then you would be
able to successfully attack the CREATE cell that the client sends to
the bridge -- but only if you could see it, which would involve MITMing
the connection to the bridge and also being able to convince the client
that you are the bridge, which I think implies having or breaking the
identity key too. Doesn't seem so bad.

So it sounds like compromise of an onion key is no worse than compromise
of an identity key, because with an identity key an attacker could cook
up and sign a new onion key. The exception is that if an attacker
somehow got an identity key but not current onion keys, and it's a
bridge that's affected rather than a relay, then the attacker would not
be able to fool clients that had previously connected and cached the
past genuine onion keys.

···

On Thu, Jun 01, 2023 at 01:21:30PM -0400, Roger Dingledine wrote:

On Thu, Jun 01, 2023 at 09:07:17AM -0400, Nick Mathewson wrote:

_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

arma · July 5, 2023, 5:28pm

Thanks, that was a subtlety I had missed. Since we are writing about
bridges, I mostly want to give the bridge perspective. We had formerly
written this:
  A relay's current onion keys appear in the Tor network
  consensus; when clients make circuits through it, they expect it
  to use certain onion keys.
We've now changed it to:
  Tor clients cache a bridge's onion public keys when they
  connect; subsequent connections only work if the cached keys are
  among the bridge's two most recently used sets of onion keys.

Makes sense.

So it sounds like compromise of an onion key is no worse than compromise
of an identity key, because with an identity key an attacker could cook
up and sign a new onion key. The exception is that if an attacker
somehow got an identity key but not current onion keys, and it's a
bridge that's affected rather than a relay, then the attacker would not
be able to fool clients that had previously connected and cached the
past genuine onion keys.

Right. But that window where the cached version protects you is quite
narrow -- it looks like modern clients fetch a new bridge descriptor
every TestingBridgeDownloadInitialDelay (3 hours) (see where we set
next_attempt_at in learned_bridge_descriptor()), and not too long ago
we fetched a fresh bridge descriptor hourly.

The reasoning for the frequent fetches is that fetching the bridge's
descriptor over a one-hop circuit is a low cost operation, and it doubles
as a crude liveness check (since if it succeeds, the bridge should work
for real circuits too, and if it fails, we should mark the bridge as
not working currently).

When Tor starts up with a cached bridge descriptor with a timestamp we like,
I imagine it is not long until we attempt to fetch a fresh descriptor. I
haven't checked our current behavior though. But I would not rely on
this onion key caching for security.

Hope this helps!
--Roger

···

On Wed, Jun 28, 2023 at 02:09:34AM -0600, David Fifield wrote:

_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays