[tor-relays] preventing DDoS is more than just network filtering

The graphs in [1] and [2] are IMO good examples related to [3]:

  "... in addition to network filtering, the (currently) sharp input signal ... is transformed into a smeared output response ... This shall make it harder for an attacker to gather infromation using time correlation techniques."

Feedback is welcome.

[1] torutils/network-metric.svg at main · toralf/torutils · GitHub
[2] torutils/network-metric-nextday.svg at main · toralf/torutils · GitHub
[3] GitHub - toralf/torutils: Few tools for a Tor relay.

···

--
Toralf

Hhm, my system log doesn't show any problems, maybe due to (or
regardless of?):
  CONFIG_SYN_COOKIES=y
?
Nevertheless, I updated the Readme to explain my point of view [1] [2].

[1] GitHub - toralf/torutils: Few tools for a Tor relay.
[2] GitHub - toralf/torutils: Few tools for a Tor relay.

···

On 11/8/22 10:57, Chris wrote:

The main reason is that a simple SYN flood can quickly fill up your
conntrack table and then legitimate packets are quietly dropped and you
won't see any problems thinking everything is perfect with your server
unless you dig into your system logs.

--
Toralf

_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

> The main reason is that a simple SYN flood can quickly fill up your
> conntrack table and then legitimate packets are quietly dropped and you
> won't see any problems thinking everything is perfect with your server
> unless you dig into your system logs.

Hhm, my system log doesn't show any problems, maybe due to (or
regardless of?):
  CONFIG_SYN_COOKIES=y
?

     On FreeBSD 12.3 I use pf and have gone back to using synproxy on the
"pass in" statements for the ORPort and DirPort, but I doubt it has actually
made any difference because the only attacks I've seen so far were coming
via other relays and triggered tor's rejections of INTRODUCE2 cells by the
thousands. Instead, what has been very effective has been to increase the
NumCPUs count drastically. On a non-hyperthreaded quad-core CPU I now have
it set as "NumCPUs 20". Each worker thread uses almost no CPU time, but
haveing enough of them waiting to grab an onionskin off the queue instantly
seems to stop all messages about cells, onionskins, or connections being
dropped.
     During an attack I often saw all workers in top(1) screen updates with
"NumCPUs 16", so I increased to 20 for the next restart, but I hadn't gotten
any of the aforementioned error/warn messages at 16. Unfortunately, I have
yet to see what happens at 20 because before the next restart Comcast made
a change that blocks me from running a relay. :frowning: I intend to find out very
soon whether I can afford to switch to their business network right away, so
that I might resume running my relay or will have to wait until things happen
next summer that should free up some of my limited income first.

Nevertheless, I updated the Readme to explain my point of view [1] [2].

[1] GitHub - toralf/torutils: Few tools for a Tor relay.
[2] GitHub - toralf/torutils: Few tools for a Tor relay.

                                  Scott Bennett, Comm. ASMELG, CFIAG

···

Toralf F?rster <toralf.foerster@gmx.de> wrote:

On 11/8/22 10:57, Chris wrote:

**********************************************************************
* Internet: bennett at sdf.org *xor* bennett at freeshell.org *
*--------------------------------------------------------------------*
* "A well regulated and disciplined militia, is at all times a good *
* objection to the introduction of that bane of all free governments *
* -- a standing army." *
* -- Gov. John Hancock, New York Journal, 28 January 1790 *
**********************************************************************
_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 11/10/2022 2:38 AM, Scott Bennett
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:202211100738.2AA7cw7d026293@sdf.org">

</pre>
      <blockquote type="cite">
        <pre class="moz-quote-pre" wrap="">On 11/8/22 10:57, Chris wrote:
</pre>
        <blockquote type="cite">
          <pre class="moz-quote-pre" wrap="">The main reason is that a simple SYN flood can quickly fill up your
conntrack table and then legitimate packets are quietly dropped and you
won't see any problems thinking everything is perfect with your server
unless you dig into your system logs.
</pre>
        </blockquote>
        <pre class="moz-quote-pre" wrap="">
Hhm, my system log doesn't show any problems, maybe due to (or
regardless of?):
  CONFIG_SYN_COOKIES=y
?

     I surmise that the above is a LINUXism that is approximately equivalent to
a pf rule using synproxy.

</pre>
      </blockquote>
      <pre class="moz-quote-pre" wrap="">
     On FreeBSD 12.3 I use pf and have gone back to using synproxy on the
"pass in" statements for the ORPort and DirPort, but I doubt it has actually
made any difference </pre>

     I should clarify my statement above by stating that the SYN packets still have
to be received from my ISP before the rule can be applied, so yes, a SYN flood attack
can still tie up my Internet connection, but that does not appear to be the kind of
attacks that my relay was experiencing. Specifying synproxy on the "pass in" rules
for tor means that the kernel simply drops any pending connection that fails to
complete the SYN-SYNACK handshake within a short time instead of passing it on to tor
to deal with; IOW, no incoming connections are passed to tor unless they complete that
handshake first.
     The second reason I made that statement was that all the attacks I have seen in
recent months have tied up my inbound (and sometimes outbound) data capacity for some
time, and the next appearance of a set of heartbeat messages from tor show an increase
in the INTRODUCE2 rejections of 2,000 to 3000 or occasionally more. I suspect the
"occasionally more" cases occur when two of the bot attacks hit my relay at the same
or overlapping times. All of the above was true before I began using synproxy again
and appears to be the case still. If you have seen SYN flood attacks, then that is
grounds enough for me to continue to leave it in the rules for tor indefinitely. The
cost to the system for using synproxy is too small to be detected, but the potential
for sparing cost to tor appears to be significant.

    </blockquote>
    <p><font size="-1"><font face="Arial">The quote about SYN Flood is
          actually from my post which went only to toralf and wasn't
          displayed on the group. My bad. To explain further, I didn't
          say the current attack includes SYN floods, what I meant was

     Ah. I see.

          whenever we have some conntrack rules in our iptables, it's
          prudent to have some rate limiting rules before it, because if
          the attacker knows we rely on conntrack and intends to do some

     Not being a LINUX user, I am unaware of what "conntrack" does. pf has a "keep"
flag that tells it to keep state for each connection, but many years ago pf was changed
to keep state anyway, whether one tells it to or not, so nowadays it is effectively a
comment. I don't know of any method by which one can tell pf *not* to keep state.

          damage, the attacker can easily flood our conntrack table with
          SYN flood and then we start dropping legitimate packets
          without notice. However you're correct, currently there are no
          SYN floods.</font></font><font size="-1"><font face="Arial">?</font></font><br>

     Understood. Thank you for the clarification.

    </p>
    <p><br>
    </p>
    <blockquote type="cite"
      cite="mid:202211100738.2AA7cw7d026293@sdf.org">
      <pre class="moz-quote-pre" wrap="">because the only attacks I've seen so far were coming
via other relays and triggered tor's rejections of INTRODUCE2 cells by the
thousands. Instead, what has been very effective has been to increase the
NumCPUs count drastically. </pre>
    </blockquote>
    <p><font size="-1"><font face="Arial">You're correct yet again. The
          number of CPUs make a huge difference. Tor automatically
          detects up to 16 CPUs if you have them. Anything above that,
          Tor can't see. I've never tried adding it to my torrc though,
          it might see more if you tell it to look for them.</font></font></p>

     It only looks for the number of CPU threads actually available if you don't
specify a value for NumCPUs. You can put any natural number there that you want,
unless there's some upper limit I don't know about, e.g., 255.

    <p><font size="-1"><font face="Arial">On my relays which are run on
          VMs, I simply added more CPUs to the VM and somewhere around
          10 CPUs seemed to be the magic number when all the warning
          messages disappeared. They are currently happily running on
          12.</font></font></p>
    <p><br>
    </p>
    <blockquote type="cite"
      cite="mid:202211100738.2AA7cw7d026293@sdf.org">
      <pre class="moz-quote-pre" wrap="">On a non-hyperthreaded quad-core CPU I now have
it set as "NumCPUs 20". </pre>
    </blockquote>
    <p><font size="-1"><font face="Arial">OK I'm confused now, Are you
          saying that it's possible to tell Tor to use non existent CPUs
          and it actually works? That would be really cool. Is it
          because Tor assigns multiple worker threads to the same CPU?<br>

     Of course, it's possible. NumCPUs only tells tor how many worker threads to
start. tor does not assign any CPU affinity, so everything gets handled by the OS's
scheduler. When the main thread encounters an onionskin that must be decrypted, it
places that onionskin onto a queue for some worker thread to pick up as soon as a
worker becomes available. Apparently how fast that occurs determines whether tor
begins dropping connections and issuing warning/error messages, so having a lot of
workers means that one is usually available or becomes available very soon, so the
timeout for decryption of that onionskin to begin doesn't happen. IOW, the timeout
seems to depend upon how long the queued onionskin waits for decryption to *begin*,
not to *complete*.
     Anytime I've seen lots of workers active in top(1), they've been showing less
than 1% CPU usage apiece, so they usually have a higher priority than the main thread
unless, of course, the main thread is waiting for a select(1) or some other I/O
operation to be posted complete, in which case the main thread will have a priority
in the single digits anyway, but isn't actually doing anything at the time. Given
that they use less than 1% CPU, it is frankly rather difficult to find one actually
running at any given instant with top(1). Instead they are usually in "kqueue" state
or some similarly waiting state. When tor is being assaulted with an INTRODUCE2
attack, the main thread is usually running at 8% to 15% CPU usage. (These are
attacks coming via other relays, so naturally the synproxy condition is satisfied and
has no effect.)
     All that having been written, I would like to point out that greatly increasing
NumCPUs does not *solve* the problem of the INTRODUCE2 attacks, nor do I have any
suggestions for how this type of attack can be prevented/stopped. It is just a
workaround that provides a way for a relay to survive them and keep running, though
at the cost of many thousands of unnecessary, undesirable onionskin decryptions.
On that scale, onionskin decryptions do become significantly expensive and the moreso
the larger the capacity of the relay's Internet connection(s).

        </font></font></p>
    <p><br>
    </p>
    <blockquote type="cite"
      cite="mid:202211100738.2AA7cw7d026293@sdf.org">
      <pre class="moz-quote-pre" wrap="">Each worker thread uses almost no CPU time, but
haveing enough of them waiting to grab an onionskin off the queue instantly
seems to stop all messages about cells, onionskins, or connections being
dropped.
     During an attack I often saw all workers in top(1) screen updates with
"NumCPUs 16", so I increased to 20 for the next restart, but I hadn't gotten
any of the aforementioned error/warn messages at 16. Unfortunately, I have
yet to see what happens at 20 because before the next restart Comcast made
a change that blocks me from running a relay. :frowning: I intend to find out very
soon whether I can afford to switch to their business network right away, so
that I might resume running my relay or will have to wait until things happen
next summer that should free up some of my limited income first.

     BTW, it is generally poor practice to post HTML to mailing lists. I usually
skip and delete HTML messages, but my eyeballs and brain are feeling fresher than
usual this evening, and your Subject: line was one I had responded to previously,
so I decided to wade through your message after all.

                                  Scott Bennett, Comm. ASMELG, CFIAG

···

Chris <tor@wcbsecurity.com> wrote:

      <pre class="moz-quote-pre" wrap="">Toralf F?rster <a class="moz-txt-link-rfc2396E" href="mailto:toralf.foerster@gmx.de">&lt;toralf.foerster@gmx.de&gt;</a> wrote:

**********************************************************************
* Internet: bennett at sdf.org *xor* bennett at freeshell.org *
*--------------------------------------------------------------------*
* "A well regulated and disciplined militia, is at all times a good *
* objection to the introduction of that bane of all free governments *
* -- a standing army." *
* -- Gov. John Hancock, New York Journal, 28 January 1790 *
**********************************************************************
_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays