[tor-relays] General overload -> DNS timeouts

nusenu:

Anders Trier Olesen:

The Tor relay guide should recommend running your recursive resolver
(unbound) on a different IP than your exit:
Tor Project | Exit Relay

yes, that is a good idea, here is a PR for it:

add new recommendations to DNS on Exit Relays section by nusenu · Pull Request #169 · torproject/community · GitHub

Thanks. I created a ticket[1] for it in our bug tracker, so your PR does not fall through the cracks.

Georg

[1] Add new recommendations to DNS on exit relays section (#239) · Issues · The Tor Project / Web / community · GitLab

I've been scratching my head with this as well. My exit family is shown
as overloaded on Tor Metrics [1]. All four instances run on one OpenBSD
box with ~50% CPU utilization. I've tried a local Unbound resolver as
well as the resolver provided by my colocation network, but the Tor log
and the metrics port keep showing ~1.5% DNS timeouts. I myself don't
notice any DNS issues, but I'm not actively monitoring it. The metrics
port and Tor log don't show any other issues besides DNS timeouts.

I don't know what the default OpenBSD DNS timeout is. It's not
configurable in /etc/resolv.conf, nor is it described in its man page.
My own testing shows that an nslookup timeout takes 15 seconds.

Should I just ignore Tor Metrics saying that my relay is overloaded and
the Tor log saying that the DNS timeouts are above threshold? I
understand that DNS issues are really bad for UX so I want to fix this
if possible.

Thanks,

Imre

[1] Relay Search

···

On Tue, Nov 09, 2021 at 06:25:31AM -0500, John Csuti via tor-relays wrote:

Hello all,

I would have to agree on this it appears that the DNS failure timeout is
too low. I have more then enough bandwidth to host tor exit nodes, and
my own unbound full recursive relay and yet i still get the timeout
message 1-1.5%. Sometimes even weird amounts such as 40-50%.

I have been working with a few people on this issue and nothing we have
tried has fixed this. The other thing is that all other servers i run
have no issue with DNS timeouts. It appears to only be a TOR issue. I
would even say that some DNS queries that TOR makes are to taken down
sites, fake sites or non-existent domains.

My big family with the same behavior; in the metrics all relays "yellow" after update of tor software.

Olaf

···

Am 17.11.21 um 19:38 schrieb Imre Jonk:

On Tue, Nov 09, 2021 at 06:25:31AM -0500, John Csuti via tor-relays wrote:

Hello all,

I would have to agree on this it appears that the DNS failure timeout is
too low. I have more then enough bandwidth to host tor exit nodes, and
my own unbound full recursive relay and yet i still get the timeout
message 1-1.5%. Sometimes even weird amounts such as 40-50%.

I have been working with a few people on this issue and nothing we have
tried has fixed this. The other thing is that all other servers i run
have no issue with DNS timeouts. It appears to only be a TOR issue. I
would even say that some DNS queries that TOR makes are to taken down
sites, fake sites or non-existent domains.

I've been scratching my head with this as well. My exit family is shown
as overloaded on Tor Metrics [1]. All four instances run on one OpenBSD
box with ~50% CPU utilization. I've tried a local Unbound resolver as
well as the resolver provided by my colocation network, but the Tor log
and the metrics port keep showing ~1.5% DNS timeouts. I myself don't
notice any DNS issues, but I'm not actively monitoring it. The metrics
port and Tor log don't show any other issues besides DNS timeouts.

I don't know what the default OpenBSD DNS timeout is. It's not
configurable in /etc/resolv.conf, nor is it described in its man page.
My own testing shows that an nslookup timeout takes 15 seconds.

Should I just ignore Tor Metrics saying that my relay is overloaded and
the Tor log saying that the DNS timeouts are above threshold? I
understand that DNS issues are really bad for UX so I want to fix this
if possible.

Thanks,

Imre

[1] Relay Search

_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
tor-relays Info Page

_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

I get that too I’ve noticed that Tor makes a lot of quest to non-existent domains. I run a pihole DNS without the ad blocking. I think this is a bug. They should at least give us the ability to control the warning level

···

On Nov 17, 2021 10:38 AM, Imre Jonk imre@imrejonk.nl wrote:

On Tue, Nov 09, 2021 at 06:25:31AM -0500, John Csuti via tor-relays wrote:

Hello all,

I would have to agree on this it appears that the DNS failure timeout is
too low. I have more then enough bandwidth to host tor exit nodes, and
my own unbound full recursive relay and yet i still get the timeout
message 1-1.5%. Sometimes even weird amounts such as 40-50%.

I have been working with a few people on this issue and nothing we have
tried has fixed this. The other thing is that all other servers i run
have no issue with DNS timeouts. It appears to only be a TOR issue. I
would even say that some DNS queries that TOR makes are to taken down
sites, fake sites or non-existent domains.

I’ve been scratching my head with this as well. My exit family is shown
as overloaded on Tor Metrics [1]. All four instances run on one OpenBSD
box with ~50% CPU utilization. I’ve tried a local Unbound resolver as
well as the resolver provided by my colocation network, but the Tor log
and the metrics port keep showing ~1.5% DNS timeouts. I myself don’t
notice any DNS issues, but I’m not actively monitoring it. The metrics
port and Tor log don’t show any other issues besides DNS timeouts.

I don’t know what the default OpenBSD DNS timeout is. It’s not
configurable in /etc/resolv.conf, nor is it described in its man page.
My own testing shows that an nslookup timeout takes 15 seconds.

Should I just ignore Tor Metrics saying that my relay is overloaded and
the Tor log saying that the DNS timeouts are above threshold? I
understand that DNS issues are really bad for UX so I want to fix this
if possible.

Thanks,

Imre

[1] Relay Search


tor-relays mailing list
tor-relays@lists.torproject.org
tor-relays Info Page

bobby stickel:

I get that too I've noticed that Tor makes a lot of quest to non-existent
domains. I run a pihole DNS without the ad blocking. I think this is a bug. They
should at least give us the ability to control the warning level

It seems only one of your exit relays is affected by a general overload, right? So, it's not clear whether you see the same DNS overload issue other folks are reporting, given that one would expect to see that on all of your relays. Maybe that's a different overload you are seeing which is worth investigating?

Tor does indeed make requests to non-existant domains. That's, in short, to test whether your resolver is behaving as it is supposed to. If you are interested in what tor is actually doing here then dns_launch_correctness_checks() in dns.c[1] is the entry point and your friend.

Georg

[1] src/feature/relay/dns.c · main · The Tor Project / Core / Tor · GitLab

···

On Nov 17, 2021 10:38 AM, Imre Jonk <imre@imrejonk.nl> wrote:

     On Tue, Nov 09, 2021 at 06:25:31AM -0500, John Csuti via tor-relays wrote:
      > Hello all,
      >
      > I would have to agree on this it appears that the DNS failure timeout is
      > too low. I have more then enough bandwidth to host tor exit nodes, and
      > my own unbound full recursive relay and yet i still get the timeout
      > message 1-1.5%. Sometimes even weird amounts such as 40-50%.
      >
      > I have been working with a few people on this issue and nothing we have
      > tried has fixed this. The other thing is that all other servers i run
      > have no issue with DNS timeouts. It appears to only be a TOR issue. I
      > would even say that some DNS queries that TOR makes are to taken down
      > sites, fake sites or non-existent domains.

     I've been scratching my head with this as well. My exit family is shown
     as overloaded on Tor Metrics [1]. All four instances run on one OpenBSD
     box with ~50% CPU utilization. I've tried a local Unbound resolver as
     well as the resolver provided by my colocation network, but the Tor log
     and the metrics port keep showing ~1.5% DNS timeouts. I myself don't
     notice any DNS issues, but I'm not actively monitoring it. The metrics
     port and Tor log don't show any other issues besides DNS timeouts.

     I don't know what the default OpenBSD DNS timeout is. It's not
     configurable in /etc/resolv.conf, nor is it described in its man page.
     My own testing shows that an nslookup timeout takes 15 seconds.

     Should I just ignore Tor Metrics saying that my relay is overloaded and
     the Tor log saying that the DNS timeouts are above threshold? I
     understand that DNS issues are really bad for UX so I want to fix this
     if possible.

     Thanks,

     Imre

     [1]
     Relay Search

     _______________________________________________
     tor-relays mailing list
     tor-relays@lists.torproject.org
     tor-relays Info Page

_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
tor-relays Info Page

Imre Jonk:

Hello all,

I would have to agree on this it appears that the DNS failure timeout is
too low. I have more then enough bandwidth to host tor exit nodes, and
my own unbound full recursive relay and yet i still get the timeout
message 1-1.5%. Sometimes even weird amounts such as 40-50%.

I have been working with a few people on this issue and nothing we have
tried has fixed this. The other thing is that all other servers i run
have no issue with DNS timeouts. It appears to only be a TOR issue. I
would even say that some DNS queries that TOR makes are to taken down
sites, fake sites or non-existent domains.

I've been scratching my head with this as well. My exit family is shown
as overloaded on Tor Metrics [1]. All four instances run on one OpenBSD
box with ~50% CPU utilization. I've tried a local Unbound resolver as
well as the resolver provided by my colocation network, but the Tor log
and the metrics port keep showing ~1.5% DNS timeouts. I myself don't
notice any DNS issues, but I'm not actively monitoring it. The metrics
port and Tor log don't show any other issues besides DNS timeouts.

I don't know what the default OpenBSD DNS timeout is. It's not
configurable in /etc/resolv.conf, nor is it described in its man page.
My own testing shows that an nslookup timeout takes 15 seconds.

Should I just ignore Tor Metrics saying that my relay is overloaded and
the Tor log saying that the DNS timeouts are above threshold? I
understand that DNS issues are really bad for UX so I want to fix this
if possible.

If the overload is related to non-DNS issues, please address it. For the DNS case it is currently a bit tricky. We are actively investigating what is going on and suspect we are dealing with a bunch of different issues leading to the DNS timeouts you and others are seeing. E.g. there might still be bugs in our code and there is probably blacklisting of DNS requests stemming from Tor related IP addresses involved and likely things we do not fully understand yet.

So, I think until we got down to the root(s) of the DNS timeout problem and have a clear understanding about what is going on and how to fix things I'd say please ignore the problem for now. We heard that having the local resolver using non-Tor IP addresses does make a difference timeout-wise[1] which seems related to the Tor-IP-addresses-getting-blocked-at-DNS-level angle I mentioned above. Thus, you could set up that if you have not already.

Some folks might consider switching to non-exit nodes to just get rid of the overload message. Please bear with us while we are debugging the problem and don't do that. :slight_smile: We'll keep this list in the loop.

Thanks,
Georg

[1] Add new recommendations to DNS on exit relays section (#239) · Issues · The Tor Project / Web / community · GitLab

···

On Tue, Nov 09, 2021 at 06:25:31AM -0500, John Csuti via tor-relays wrote:

Thanks,

Imre

[1] Relay Search

_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
tor-relays Info Page

Some folks might consider switching to non-exit nodes to just get rid of

the overload message. Please bear with us while we are debugging the

problem and don't do that. :slight_smile: We'll keep this list in the loop.

The undocumented configuration option 'OverloadStatistics' can be used to disable the reporting of an overloaded state. E.g. place the following in your torrc:

OverloadStatistics 0

May be worth considering until the reporting feature becomes a bit more mature and the issues around DNS resolution become a bit clearer.

···

_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

1 Like

Arlen Yaroslav via tor-relays:

The undocumented configuration option 'OverloadStatistics'

related:

···

--
https://nusenu.github.io
_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

If the overload is related to non-DNS issues, please address it. For the DNS
case it is currently a bit tricky. We are actively investigating what is
going on and suspect we are dealing with a bunch of different issues leading
to the DNS timeouts you and others are seeing. E.g. there might still be
bugs in our code and there is probably blacklisting of DNS requests stemming
from Tor related IP addresses involved and likely things we do not fully
understand yet.

So, I think until we got down to the root(s) of the DNS timeout problem and
have a clear understanding about what is going on and how to fix things I'd
say please ignore the problem for now. We heard that having the local
resolver using non-Tor IP addresses does make a difference timeout-wise[1]
which seems related to the Tor-IP-addresses-getting-blocked-at-DNS-level
angle I mentioned above. Thus, you could set up that if you have not
already.

Thanks, I'll keep an eye on this list for further developments on this topic.

To clarify, I'm currently using my colocation network's DNS resolver. The
fallback is Hurricane Electric's anycast resolver. Both perform DNSSEC
validation.

Some folks might consider switching to non-exit nodes to just get rid of the
overload message. Please bear with us while we are debugging the problem and
don't do that. :slight_smile: We'll keep this list in the loop.

Don't worry, this is not something I would quit running an exit for :slight_smile:

···

On Thu, Nov 18, 2021 at 08:30:16AM +0000, Georg Koppen wrote:

Greetings everyone!

We wanted to follow up with all of you on this. It has been a while but we
finally got down to the problem.

We made this ticket public which is where we pulled together the information
we had from Exit operators helping us in private:

You can find here the summary of the problem:

The gist is that tor imposes a 5 seconds timeout basically dictating libevent
to give up on the DNS resolve after 5 seconds. And it will do that 3 times
before an error is returned to tor.

That very error is a "DNS TIMEOUT" which is what we expose on the MetricsPort
and also use for the overload general indicator.

The problem lies with that very error. It is in fact _not_ a "real" DNS
timeout but rather just "took too long for the parameters I have". So these
timeouts should more be seen as a "UX issue" rather than "network issue".

For that reason, we will remove the DNS timeout from the overload general
indicator and we will rename also the "dns timeout" metrics on the MetricsPort
to something with a more meaningful name.

Operators can still use the DNS metrics to monitor health of the DNS by
looking at all other possible errors especially "serverfailed".

Finally, we will most likely also bring down the Tor DNS timeout from 5
seconds to 1 seconds in order to improve UX:

We will likely fix this the current 0.4.7.x development version and backport
it into 0.4.6 stable. Release time line is to come but we hope as soon as
possible.

Thanks everyone for your help, feedback and patience with this problem! In
particular, thanks a lot to Anders Trier for their help and providing us with
an Exit relay we could experiment with and toralf for providing so much useful
information from their relays.

Cheers!
David

···

On 18 Nov (10:01:09), Arlen Yaroslav via tor-relays wrote:

> Some folks might consider switching to non-exit nodes to just get rid of
>
> the overload message. Please bear with us while we are debugging the
>
> problem and don't do that. :slight_smile: We'll keep this list in the loop.

The undocumented configuration option 'OverloadStatistics' can be used to
disable the reporting of an overloaded state. E.g. place the following in
your torrc:

OverloadStatistics 0

May be worth considering until the reporting feature becomes a bit more
mature and the issues around DNS resolution become a bit clearer.

--
u6A7qkchZSncFBzpYV44fV8NYMmiQ60PU5/P9VOyegk=

1 Like

To stop confusing operators it would make sense to remove the

"This relay is overloaded since"

banner from Relay Search for all tor versions prior to
0.4.6.9 and 0.4.7.3-alpha, no?

kind regards,
nusenu

···

--
https://nusenu.github.io
_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

I agree its kinda pointless if you know the issue already…

Thanks,
John C.

···

On 2021-12-16 08:41 AM, nusenu wrote:

To stop confusing operators it would make sense to remove the

“This relay is overloaded since”

banner from Relay Search for all tor versions prior to
0.4.6.9 and 0.4.7.3-alpha, no?

kind regards,
nusenu

1 Like

+1
Very wise suggestion.

···

On Thursday, December 16, 2021 2:41:27 PM CET nusenu wrote:

To stop confusing operators it would make sense to remove the

"This relay is overloaded since"

banner from Relay Search for all tor versions prior to
0.4.6.9 and 0.4.7.3-alpha, no?

kind regards,
nusenu

--
╰_╯ Ciao Marco!

Debian GNU/Linux

It's free software and it gives you freedom!

nusenu:

To stop confusing operators it would make sense to remove the

"This relay is overloaded since"

banner from Relay Search for all tor versions prior to
0.4.6.9 and 0.4.7.3-alpha, no?

Well, not all potential overload is DNS related overload. There are a bunch of different criteria for emitting a general overload warning.
Onionoo and this relay search have a hard time differentiating between DNS related (general) overload and other (general) overload. Thus, I don't think this change is easily to make.

I think the best option here is to upgrade swiftly to 0.4.6.9/0.4.7.3-alpha.

That said, we should update our documentation accordingly. I've filed a ticket for that.[1]

Georg

[1] Update overload article (#279) · Issues · The Tor Project / Web / support · GitLab

Georg Koppen:

Well, not all potential overload is DNS related overload. There are a
bunch of different criteria for emitting a general overload warning. Onionoo and this relay search have a hard time differentiating
between DNS related (general) overload and other (general) overload.
Thus, I don't think this change is easily to make.

To have the DNS trigger included in a shared trigger info
was a deliberate design decision as I understood it.

In my opinion it is better to remove this notice from Relay Search
for all affected versions, even if it will also remove the warning in
cases where the trigger was not DNS related, because
it potentially causes alarm fatique and operators will continue
to ignore the banner even after it got improved.

I think the best option here is to upgrade swiftly to
0.4.6.9/0.4.7.3-alpha.

That is not easy for all of the operators that use the Torproject's Debian repos
since these versions are usually not "swiftly" available on deb.torproject.org yet
(unless you switch to nightly packages which I wouldn't recommend).
currently: Version: 0.4.6.8-1~d10.buster+1 [1]

kind regards,
nusenu

[1] https://deb.torproject.org/torproject.org/dists/buster/main/binary-amd64/Packages

···

--
https://nusenu.github.io
_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

It would be nice if we could make the DNS time out percentage threshold higher in our config file so Tor isn’t reporting our exit relays has overloaded

bobby stickel:

It would be nice if we could make the DNS time out percentage threshold higher
in our config file so Tor isn't reporting our exit relays has overloaded

if you run debian and use deb.torproject.org packages,
running
apt update && apt upgrade
should be your solution now
since the stable repo has been updated to
tor 0.4.6.9
and the experimental repo contains
0.4.7.3-alpha which also includes your desired change.

kind regards,
nusenu

···

--
https://nusenu.github.io
_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

Well, I have to say thanks to the update to tor 0.4.6.9 the DNS overload issue is gone. My consensus Weight went down sightly due to the constant overload flag. Lets see if time will help heal that.

Good work so far.

Thanks,
John C.

···

On 2021-12-21 07:39 AM, nusenu wrote:

bobby stickel:

It would be nice if we could make the DNS time out percentage threshold higher
in our config file so Tor isn’t reporting our exit relays has overloaded

if you run debian and use deb.torproject.org packages,
running
apt update && apt upgrade
should be your solution now
since the stable repo has been updated to
tor 0.4.6.9
and the experimental repo contains
0.4.7.3-alpha which also includes your desired change.

kind regards,
nusenu

Hey all, I wanted to chime in on this thread because I’m suddenly seeing DNS “Overload” errors (and corresponding notices that my system is overloaded on prometheus) lately as well.

The hardware and OS and configs for my public exit haven’t changed - what has changed is that I upgraded tor itself, and added ipv6.

I suspect a decent amount of my DNS failures are actually lookups for AAAA records that don’t exist, because my exit supports v6 but the destination site doesn’t, or only half-configured it.

The system itself is definitely NOT overloaded. ( load averages: 0.07, 0.23, 0.24 )

···

On Fri, Dec 17, 2021 at 2:03 AM nusenu <nusenu-lists@riseup.net> wrote:

Georg Koppen:

Well, not all potential overload is DNS related overload. There are a
bunch of different criteria for emitting a general overload warning.
Onionoo and this relay search have a hard time differentiating
between DNS related (general) overload and other (general) overload.
Thus, I don’t think this change is easily to make.

To have the DNS trigger included in a shared trigger info
was a deliberate design decision as I understood it.

In my opinion it is better to remove this notice from Relay Search
for all affected versions, even if it will also remove the warning in
cases where the trigger was not DNS related, because
it potentially causes alarm fatique and operators will continue
to ignore the banner even after it got improved.

I think the best option here is to upgrade swiftly to
0.4.6.9/0.4.7.3-alpha.

That is not easy for all of the operators that use the Torproject’s Debian repos
since these versions are usually not “swiftly” available on deb.torproject.org yet
(unless you switch to nightly packages which I wouldn’t recommend).
currently: Version: 0.4.6.8-1~d10.buster+1 [1]

kind regards,
nusenu

[1] https://deb.torproject.org/torproject.org/dists/buster/main/binary-amd64/Packages


https://nusenu.github.io


tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

AMuse:

Hey all, I wanted to chime in on this thread because I'm suddenly seeing
DNS "Overload" errors (and corresponding notices that my system is
overloaded on prometheus) lately as well.

The hardware and OS and configs for my public exit haven't changed - what
has changed is that I upgraded tor itself, and added ipv6.

You appear to be running tor 0.4.6.8 on FreeBSD.

As has been previously stated on this thread the code involved
has been changed in tor 0.4.6.9 and 0.4.7.3-alpha.

So you will have to wait till FreeBSD ports ship that version.

After upgrading the tor version the overload indicator on
will disappear when it it was DNS related (there can be other reasons).

kind regards,
nusenu

···

--
https://nusenu.github.io
_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

1 Like