[tor-dev] Is Arti expected to have better multi-CPU support than C-tor?

Linus Nordberg and I are preparing a submission for FOCI about the
special way we run tor on the Snowflake bridge. We run many tor
processes with the same identity and onion keys, because otherwise tor
being limited to one CPU would be the main bottleneck.

I'm writing to fact-check a claim about Arti and how we hope the current
complicated procedure will not be needed in the future:

  The first and most important bottleneck to overcome is the
  single-threaded nature of the Tor implementation.² A single Tor
  process is limited to one CPU core: once Tor hits 100% CPU, the
  performance of the bridge is capped, no matter the speed of the
  network connection or the number of CPU cores

  ²We expect that Arti, the in-progress reimplementation of Tor,
  will be natively multi-threaded, and remove this primary
  complication.

Is this correct? Is a relay that uses a future version of Arti expected
to be able to use all its CPU resources?

Here is the a draft of the submission. If you have any comments, our
submission deadline is 2023-03-15.

https://www.bamsoftware.com/papers/pt-bridge-hiperf/pt-bridge-hiperf.20230307.tex

···

_______________________________________________
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

Yes, that’s right. There is no “main thread” in Arti; it’s written in an asynchronous task-oriented style, and we use a runtime written in Rust (Tokio by default, but we abstract them so you can swap them out) to schedule tasks across multiple threads.

That said, we have spent approximately zero time so far tuning this multithreading, and I’d be surprised if it scales perfectly the first time. Our first opportunity to show off here will be when we get onion service support later in this year.

cheers,

···

On Tue, Mar 7, 2023 at 4:07 PM David Fifield <david@bamsoftware.com> wrote:

Linus Nordberg and I are preparing a submission for FOCI about the
special way we run tor on the Snowflake bridge. We run many tor
processes with the same identity and onion keys, because otherwise tor
being limited to one CPU would be the main bottleneck.

I’m writing to fact-check a claim about Arti and how we hope the current
complicated procedure will not be needed in the future:

The first and most important bottleneck to overcome is the
single-threaded nature of the Tor implementation.² A single Tor
process is limited to one CPU core: once Tor hits 100% CPU, the
performance of the bridge is capped, no matter the speed of the
network connection or the number of CPU cores

²We expect that Arti, the in-progress reimplementation of Tor,
will be natively multi-threaded, and remove this primary
complication.

Is this correct? Is a relay that uses a future version of Arti expected
to be able to use all its CPU resources?

Nick

Somewhat related: Rust programs generally tend to have a better
performance than their C pedants if they really want to. This is mainly
due to the fact, that crazy thread optimization can be done securly.

A prominent example is [fd](GitHub - sharkdp/fd: A simple, fast and user-friendly alternative to 'find'), which uses
multiple threads to traverse the file system, thereby being around 50%
faster than find(1). Just the imagination of a parallel FS access in C
gives me nightmares. :^)

-- Emil

···

On Wed, Mar 08, 2023 at 06:30:42AM -0500, Nick Mathewson wrote:

That said, we have spent approximately zero time so far tuning this
multithreading, and I'd be surprised if it scales perfectly the first
time.

_______________________________________________
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

Thank you, Nick.

···

On Wed, Mar 08, 2023 at 06:30:42AM -0500, Nick Mathewson wrote:

On Tue, Mar 7, 2023 at 4:07 PM David Fifield <[1]david@bamsoftware.com> wrote:

    Linus Nordberg and I are preparing a submission for FOCI about the
    special way we run tor on the Snowflake bridge. We run many tor
    processes with the same identity and onion keys, because otherwise tor
    being limited to one CPU would be the main bottleneck.

    I'm writing to fact-check a claim about Arti and how we hope the current
    complicated procedure will not be needed in the future:

     The first and most important bottleneck to overcome is the
     single-threaded nature of the Tor implementation.² A single Tor
     process is limited to one CPU core: once Tor hits 100% CPU, the
     performance of the bridge is capped, no matter the speed of the
     network connection or the number of CPU cores

     ²We expect that Arti, the in-progress reimplementation of Tor,
     will be natively multi-threaded, and remove this primary
     complication.

    Is this correct? Is a relay that uses a future version of Arti expected
    to be able to use all its CPU resources?

Yes, that's right. There is no "main thread" in Arti; it's written in an
asynchronous task-oriented style, and we use a runtime written in Rust (Tokio
by default, but we abstract them so you can swap them out) to schedule tasks
across multiple threads.

That said, we have spent approximately zero time so far tuning this
multithreading, and I'd be surprised if it scales perfectly the first time.
Our first opportunity to show off here will be when we get onion service
support later in this year.

_______________________________________________
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev