[tor-relays] General overload -> DNS timeouts

nusenu:

Anders Trier Olesen:

The Tor relay guide should recommend running your recursive resolver
(unbound) on a different IP than your exit:
Tor Project | Exit

yes, that is a good idea, here is a PR for it:

add new recommendations to DNS on Exit Relays section by nusenu 路 Pull Request #169 路 torproject/community 路 GitHub

Thanks. I created a ticket[1] for it in our bug tracker, so your PR does not fall through the cracks.

Georg

[1] Add new recommendations to DNS on exit relays section (#239) 路 Issues 路 The Tor Project / Web / community 路 GitLab

I've been scratching my head with this as well. My exit family is shown
as overloaded on Tor Metrics [1]. All four instances run on one OpenBSD
box with ~50% CPU utilization. I've tried a local Unbound resolver as
well as the resolver provided by my colocation network, but the Tor log
and the metrics port keep showing ~1.5% DNS timeouts. I myself don't
notice any DNS issues, but I'm not actively monitoring it. The metrics
port and Tor log don't show any other issues besides DNS timeouts.

I don't know what the default OpenBSD DNS timeout is. It's not
configurable in /etc/resolv.conf, nor is it described in its man page.
My own testing shows that an nslookup timeout takes 15 seconds.

Should I just ignore Tor Metrics saying that my relay is overloaded and
the Tor log saying that the DNS timeouts are above threshold? I
understand that DNS issues are really bad for UX so I want to fix this
if possible.

Thanks,

Imre

[1] Relay Search

路路路

On Tue, Nov 09, 2021 at 06:25:31AM -0500, John Csuti via tor-relays wrote:

Hello all,

I would have to agree on this it appears that the DNS failure timeout is
too low. I have more then enough bandwidth to host tor exit nodes, and
my own unbound full recursive relay and yet i still get the timeout
message 1-1.5%. Sometimes even weird amounts such as 40-50%.

I have been working with a few people on this issue and nothing we have
tried has fixed this. The other thing is that all other servers i run
have no issue with DNS timeouts. It appears to only be a TOR issue. I
would even say that some DNS queries that TOR makes are to taken down
sites, fake sites or non-existent domains.

My big family with the same behavior; in the metrics all relays "yellow" after update of tor software.

Olaf

路路路

Am 17.11.21 um 19:38 schrieb Imre Jonk:

On Tue, Nov 09, 2021 at 06:25:31AM -0500, John Csuti via tor-relays wrote:

Hello all,

I would have to agree on this it appears that the DNS failure timeout is
too low. I have more then enough bandwidth to host tor exit nodes, and
my own unbound full recursive relay and yet i still get the timeout
message 1-1.5%. Sometimes even weird amounts such as 40-50%.

I have been working with a few people on this issue and nothing we have
tried has fixed this. The other thing is that all other servers i run
have no issue with DNS timeouts. It appears to only be a TOR issue. I
would even say that some DNS queries that TOR makes are to taken down
sites, fake sites or non-existent domains.

I've been scratching my head with this as well. My exit family is shown
as overloaded on Tor Metrics [1]. All four instances run on one OpenBSD
box with ~50% CPU utilization. I've tried a local Unbound resolver as
well as the resolver provided by my colocation network, but the Tor log
and the metrics port keep showing ~1.5% DNS timeouts. I myself don't
notice any DNS issues, but I'm not actively monitoring it. The metrics
port and Tor log don't show any other issues besides DNS timeouts.

I don't know what the default OpenBSD DNS timeout is. It's not
configurable in /etc/resolv.conf, nor is it described in its man page.
My own testing shows that an nslookup timeout takes 15 seconds.

Should I just ignore Tor Metrics saying that my relay is overloaded and
the Tor log saying that the DNS timeouts are above threshold? I
understand that DNS issues are really bad for UX so I want to fix this
if possible.

Thanks,

Imre

[1] Relay Search

_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
tor-relays Info Page

_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

I get that too I鈥檝e noticed that Tor makes a lot of quest to non-existent domains. I run a pihole DNS without the ad blocking. I think this is a bug. They should at least give us the ability to control the warning level

路路路

On Nov 17, 2021 10:38 AM, Imre Jonk imre@imrejonk.nl wrote:

On Tue, Nov 09, 2021 at 06:25:31AM -0500, John Csuti via tor-relays wrote:

Hello all,

I would have to agree on this it appears that the DNS failure timeout is
too low. I have more then enough bandwidth to host tor exit nodes, and
my own unbound full recursive relay and yet i still get the timeout
message 1-1.5%. Sometimes even weird amounts such as 40-50%.

I have been working with a few people on this issue and nothing we have
tried has fixed this. The other thing is that all other servers i run
have no issue with DNS timeouts. It appears to only be a TOR issue. I
would even say that some DNS queries that TOR makes are to taken down
sites, fake sites or non-existent domains.

I鈥檝e been scratching my head with this as well. My exit family is shown
as overloaded on Tor Metrics [1]. All four instances run on one OpenBSD
box with ~50% CPU utilization. I鈥檝e tried a local Unbound resolver as
well as the resolver provided by my colocation network, but the Tor log
and the metrics port keep showing ~1.5% DNS timeouts. I myself don鈥檛
notice any DNS issues, but I鈥檓 not actively monitoring it. The metrics
port and Tor log don鈥檛 show any other issues besides DNS timeouts.

I don鈥檛 know what the default OpenBSD DNS timeout is. It鈥檚 not
configurable in /etc/resolv.conf, nor is it described in its man page.
My own testing shows that an nslookup timeout takes 15 seconds.

Should I just ignore Tor Metrics saying that my relay is overloaded and
the Tor log saying that the DNS timeouts are above threshold? I
understand that DNS issues are really bad for UX so I want to fix this
if possible.

Thanks,

Imre

[1] Relay Search


tor-relays mailing list
tor-relays@lists.torproject.org
tor-relays Info Page

bobby stickel:

I get that too I've noticed that Tor makes a lot of quest to non-existent
domains. I run a pihole DNS without the ad blocking. I think this is a bug. They
should at least give us the ability to control the warning level

It seems only one of your exit relays is affected by a general overload, right? So, it's not clear whether you see the same DNS overload issue other folks are reporting, given that one would expect to see that on all of your relays. Maybe that's a different overload you are seeing which is worth investigating?

Tor does indeed make requests to non-existant domains. That's, in short, to test whether your resolver is behaving as it is supposed to. If you are interested in what tor is actually doing here then dns_launch_correctness_checks() in dns.c[1] is the entry point and your friend.

Georg

[1] src/feature/relay/dns.c 路 main 路 The Tor Project / Core / Tor 路 GitLab

路路路

On Nov 17, 2021 10:38 AM, Imre Jonk <imre@imrejonk.nl> wrote:

聽聽聽聽聽On Tue, Nov 09, 2021 at 06:25:31AM -0500, John Csuti via tor-relays wrote:
聽聽聽聽聽聽> Hello all,
聽聽聽聽聽聽>
聽聽聽聽聽聽> I would have to agree on this it appears that the DNS failure timeout is
聽聽聽聽聽聽> too low. I have more then enough bandwidth to host tor exit nodes, and
聽聽聽聽聽聽> my own unbound full recursive relay and yet i still get the timeout
聽聽聽聽聽聽> message 1-1.5%. Sometimes even weird amounts such as 40-50%.
聽聽聽聽聽聽>
聽聽聽聽聽聽> I have been working with a few people on this issue and nothing we have
聽聽聽聽聽聽> tried has fixed this. The other thing is that all other servers i run
聽聽聽聽聽聽> have no issue with DNS timeouts. It appears to only be a TOR issue. I
聽聽聽聽聽聽> would even say that some DNS queries that TOR makes are to taken down
聽聽聽聽聽聽> sites, fake sites or non-existent domains.

聽聽聽聽聽I've been scratching my head with this as well. My exit family is shown
聽聽聽聽聽as overloaded on Tor Metrics [1]. All four instances run on one OpenBSD
聽聽聽聽聽box with ~50% CPU utilization. I've tried a local Unbound resolver as
聽聽聽聽聽well as the resolver provided by my colocation network, but the Tor log
聽聽聽聽聽and the metrics port keep showing ~1.5% DNS timeouts. I myself don't
聽聽聽聽聽notice any DNS issues, but I'm not actively monitoring it. The metrics
聽聽聽聽聽port and Tor log don't show any other issues besides DNS timeouts.

聽聽聽聽聽I don't know what the default OpenBSD DNS timeout is. It's not
聽聽聽聽聽configurable in /etc/resolv.conf, nor is it described in its man page.
聽聽聽聽聽My own testing shows that an nslookup timeout takes 15 seconds.

聽聽聽聽聽Should I just ignore Tor Metrics saying that my relay is overloaded and
聽聽聽聽聽the Tor log saying that the DNS timeouts are above threshold? I
聽聽聽聽聽understand that DNS issues are really bad for UX so I want to fix this
聽聽聽聽聽if possible.

聽聽聽聽聽Thanks,

聽聽聽聽聽Imre

聽聽聽聽聽[1]
聽聽聽聽聽Relay Search

聽聽聽聽聽_______________________________________________
聽聽聽聽聽tor-relays mailing list
聽聽聽聽聽tor-relays@lists.torproject.org
聽聽聽聽聽tor-relays Info Page

_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
tor-relays Info Page

Imre Jonk:

Hello all,

I would have to agree on this it appears that the DNS failure timeout is
too low. I have more then enough bandwidth to host tor exit nodes, and
my own unbound full recursive relay and yet i still get the timeout
message 1-1.5%. Sometimes even weird amounts such as 40-50%.

I have been working with a few people on this issue and nothing we have
tried has fixed this. The other thing is that all other servers i run
have no issue with DNS timeouts. It appears to only be a TOR issue. I
would even say that some DNS queries that TOR makes are to taken down
sites, fake sites or non-existent domains.

I've been scratching my head with this as well. My exit family is shown
as overloaded on Tor Metrics [1]. All four instances run on one OpenBSD
box with ~50% CPU utilization. I've tried a local Unbound resolver as
well as the resolver provided by my colocation network, but the Tor log
and the metrics port keep showing ~1.5% DNS timeouts. I myself don't
notice any DNS issues, but I'm not actively monitoring it. The metrics
port and Tor log don't show any other issues besides DNS timeouts.

I don't know what the default OpenBSD DNS timeout is. It's not
configurable in /etc/resolv.conf, nor is it described in its man page.
My own testing shows that an nslookup timeout takes 15 seconds.

Should I just ignore Tor Metrics saying that my relay is overloaded and
the Tor log saying that the DNS timeouts are above threshold? I
understand that DNS issues are really bad for UX so I want to fix this
if possible.

If the overload is related to non-DNS issues, please address it. For the DNS case it is currently a bit tricky. We are actively investigating what is going on and suspect we are dealing with a bunch of different issues leading to the DNS timeouts you and others are seeing. E.g. there might still be bugs in our code and there is probably blacklisting of DNS requests stemming from Tor related IP addresses involved and likely things we do not fully understand yet.

So, I think until we got down to the root(s) of the DNS timeout problem and have a clear understanding about what is going on and how to fix things I'd say please ignore the problem for now. We heard that having the local resolver using non-Tor IP addresses does make a difference timeout-wise[1] which seems related to the Tor-IP-addresses-getting-blocked-at-DNS-level angle I mentioned above. Thus, you could set up that if you have not already.

Some folks might consider switching to non-exit nodes to just get rid of the overload message. Please bear with us while we are debugging the problem and don't do that. :slight_smile: We'll keep this list in the loop.

Thanks,
Georg

[1] Add new recommendations to DNS on exit relays section (#239) 路 Issues 路 The Tor Project / Web / community 路 GitLab

路路路

On Tue, Nov 09, 2021 at 06:25:31AM -0500, John Csuti via tor-relays wrote:

Thanks,

Imre

[1] Relay Search

_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
tor-relays Info Page

Some folks might consider switching to non-exit nodes to just get rid of

the overload message. Please bear with us while we are debugging the

problem and don't do that. :slight_smile: We'll keep this list in the loop.

The undocumented configuration option 'OverloadStatistics' can be used to disable the reporting of an overloaded state. E.g. place the following in your torrc:

OverloadStatistics 0

May be worth considering until the reporting feature becomes a bit more mature and the issues around DNS resolution become a bit clearer.

路路路

_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

1 Like

Arlen Yaroslav via tor-relays:

The undocumented configuration option 'OverloadStatistics'

related:

路路路

--
https://nusenu.github.io
_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

If the overload is related to non-DNS issues, please address it. For the DNS
case it is currently a bit tricky. We are actively investigating what is
going on and suspect we are dealing with a bunch of different issues leading
to the DNS timeouts you and others are seeing. E.g. there might still be
bugs in our code and there is probably blacklisting of DNS requests stemming
from Tor related IP addresses involved and likely things we do not fully
understand yet.

So, I think until we got down to the root(s) of the DNS timeout problem and
have a clear understanding about what is going on and how to fix things I'd
say please ignore the problem for now. We heard that having the local
resolver using non-Tor IP addresses does make a difference timeout-wise[1]
which seems related to the Tor-IP-addresses-getting-blocked-at-DNS-level
angle I mentioned above. Thus, you could set up that if you have not
already.

Thanks, I'll keep an eye on this list for further developments on this topic.

To clarify, I'm currently using my colocation network's DNS resolver. The
fallback is Hurricane Electric's anycast resolver. Both perform DNSSEC
validation.

Some folks might consider switching to non-exit nodes to just get rid of the
overload message. Please bear with us while we are debugging the problem and
don't do that. :slight_smile: We'll keep this list in the loop.

Don't worry, this is not something I would quit running an exit for :slight_smile:

路路路

On Thu, Nov 18, 2021 at 08:30:16AM +0000, Georg Koppen wrote:

Greetings everyone!

We wanted to follow up with all of you on this. It has been a while but we
finally got down to the problem.

We made this ticket public which is where we pulled together the information
we had from Exit operators helping us in private:

You can find here the summary of the problem:

The gist is that tor imposes a 5 seconds timeout basically dictating libevent
to give up on the DNS resolve after 5 seconds. And it will do that 3 times
before an error is returned to tor.

That very error is a "DNS TIMEOUT" which is what we expose on the MetricsPort
and also use for the overload general indicator.

The problem lies with that very error. It is in fact _not_ a "real" DNS
timeout but rather just "took too long for the parameters I have". So these
timeouts should more be seen as a "UX issue" rather than "network issue".

For that reason, we will remove the DNS timeout from the overload general
indicator and we will rename also the "dns timeout" metrics on the MetricsPort
to something with a more meaningful name.

Operators can still use the DNS metrics to monitor health of the DNS by
looking at all other possible errors especially "serverfailed".

Finally, we will most likely also bring down the Tor DNS timeout from 5
seconds to 1 seconds in order to improve UX:

We will likely fix this the current 0.4.7.x development version and backport
it into 0.4.6 stable. Release time line is to come but we hope as soon as
possible.

Thanks everyone for your help, feedback and patience with this problem! In
particular, thanks a lot to Anders Trier for their help and providing us with
an Exit relay we could experiment with and toralf for providing so much useful
information from their relays.

Cheers!
David

路路路

On 18 Nov (10:01:09), Arlen Yaroslav via tor-relays wrote:

> Some folks might consider switching to non-exit nodes to just get rid of
>
> the overload message. Please bear with us while we are debugging the
>
> problem and don't do that. :slight_smile: We'll keep this list in the loop.

The undocumented configuration option 'OverloadStatistics' can be used to
disable the reporting of an overloaded state. E.g. place the following in
your torrc:

OverloadStatistics 0

May be worth considering until the reporting feature becomes a bit more
mature and the issues around DNS resolution become a bit clearer.

--
u6A7qkchZSncFBzpYV44fV8NYMmiQ60PU5/P9VOyegk=

1 Like

To stop confusing operators it would make sense to remove the

"This relay is overloaded since"

banner from Relay Search for all tor versions prior to
0.4.6.9 and 0.4.7.3-alpha, no?

kind regards,
nusenu

路路路

--
https://nusenu.github.io
_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

I agree its kinda pointless if you know the issue already鈥

Thanks,
John C.

路路路

On 2021-12-16 08:41 AM, nusenu wrote:

To stop confusing operators it would make sense to remove the

鈥淭his relay is overloaded since鈥

banner from Relay Search for all tor versions prior to
0.4.6.9 and 0.4.7.3-alpha, no?

kind regards,
nusenu

1 Like

+1
Very wise suggestion.

路路路

On Thursday, December 16, 2021 2:41:27 PM CET nusenu wrote:

To stop confusing operators it would make sense to remove the

"This relay is overloaded since"

banner from Relay Search for all tor versions prior to
0.4.6.9 and 0.4.7.3-alpha, no?

kind regards,
nusenu

--
鈺癬鈺 Ciao Marco!

Debian GNU/Linux

It's free software and it gives you freedom!

nusenu:

To stop confusing operators it would make sense to remove the

"This relay is overloaded since"

banner from Relay Search for all tor versions prior to
0.4.6.9 and 0.4.7.3-alpha, no?

Well, not all potential overload is DNS related overload. There are a bunch of different criteria for emitting a general overload warning.
Onionoo and this relay search have a hard time differentiating between DNS related (general) overload and other (general) overload. Thus, I don't think this change is easily to make.

I think the best option here is to upgrade swiftly to 0.4.6.9/0.4.7.3-alpha.

That said, we should update our documentation accordingly. I've filed a ticket for that.[1]

Georg

[1] Update overload article (#279) 路 Issues 路 The Tor Project / Web / support 路 GitLab

Georg Koppen:

Well, not all potential overload is DNS related overload. There are a
bunch of different criteria for emitting a general overload warning. Onionoo and this relay search have a hard time differentiating
between DNS related (general) overload and other (general) overload.
Thus, I don't think this change is easily to make.

To have the DNS trigger included in a shared trigger info
was a deliberate design decision as I understood it.

In my opinion it is better to remove this notice from Relay Search
for all affected versions, even if it will also remove the warning in
cases where the trigger was not DNS related, because
it potentially causes alarm fatique and operators will continue
to ignore the banner even after it got improved.

I think the best option here is to upgrade swiftly to
0.4.6.9/0.4.7.3-alpha.

That is not easy for all of the operators that use the Torproject's Debian repos
since these versions are usually not "swiftly" available on deb.torproject.org yet
(unless you switch to nightly packages which I wouldn't recommend).
currently: Version: 0.4.6.8-1~d10.buster+1 [1]

kind regards,
nusenu

[1] https://deb.torproject.org/torproject.org/dists/buster/main/binary-amd64/Packages

路路路

--
https://nusenu.github.io
_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

It would be nice if we could make the DNS time out percentage threshold higher in our config file so Tor isn鈥檛 reporting our exit relays has overloaded

bobby stickel:

It would be nice if we could make the DNS time out percentage threshold higher
in our config file so Tor isn't reporting our exit relays has overloaded

if you run debian and use deb.torproject.org packages,
running
apt update && apt upgrade
should be your solution now
since the stable repo has been updated to
tor 0.4.6.9
and the experimental repo contains
0.4.7.3-alpha which also includes your desired change.

kind regards,
nusenu

路路路

--
https://nusenu.github.io
_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

Well, I have to say thanks to the update to tor 0.4.6.9 the DNS overload issue is gone. My consensus Weight went down sightly due to the constant overload flag. Lets see if time will help heal that.

Good work so far.

Thanks,
John C.

路路路

On 2021-12-21 07:39 AM, nusenu wrote:

bobby stickel:

It would be nice if we could make the DNS time out percentage threshold higher
in our config file so Tor isn鈥檛 reporting our exit relays has overloaded

if you run debian and use deb.torproject.org packages,
running
apt update && apt upgrade
should be your solution now
since the stable repo has been updated to
tor 0.4.6.9
and the experimental repo contains
0.4.7.3-alpha which also includes your desired change.

kind regards,
nusenu

Hey all, I wanted to chime in on this thread because I鈥檓 suddenly seeing DNS 鈥淥verload鈥 errors (and corresponding notices that my system is overloaded on prometheus) lately as well.

The hardware and OS and configs for my public exit haven鈥檛 changed - what has changed is that I upgraded tor itself, and added ipv6.

I suspect a decent amount of my DNS failures are actually lookups for AAAA records that don鈥檛 exist, because my exit supports v6 but the destination site doesn鈥檛, or only half-configured it.

The system itself is definitely NOT overloaded. ( load averages: 0.07, 0.23, 0.24 )

路路路

On Fri, Dec 17, 2021 at 2:03 AM nusenu <nusenu-lists@riseup.net> wrote:

Georg Koppen:

Well, not all potential overload is DNS related overload. There are a
bunch of different criteria for emitting a general overload warning.
Onionoo and this relay search have a hard time differentiating
between DNS related (general) overload and other (general) overload.
Thus, I don鈥檛 think this change is easily to make.

To have the DNS trigger included in a shared trigger info
was a deliberate design decision as I understood it.

In my opinion it is better to remove this notice from Relay Search
for all affected versions, even if it will also remove the warning in
cases where the trigger was not DNS related, because
it potentially causes alarm fatique and operators will continue
to ignore the banner even after it got improved.

I think the best option here is to upgrade swiftly to
0.4.6.9/0.4.7.3-alpha.

That is not easy for all of the operators that use the Torproject鈥檚 Debian repos
since these versions are usually not 鈥渟wiftly鈥 available on deb.torproject.org yet
(unless you switch to nightly packages which I wouldn鈥檛 recommend).
currently: Version: 0.4.6.8-1~d10.buster+1 [1]

kind regards,
nusenu

[1] https://deb.torproject.org/torproject.org/dists/buster/main/binary-amd64/Packages


https://nusenu.github.io


tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

AMuse:

Hey all, I wanted to chime in on this thread because I'm suddenly seeing
DNS "Overload" errors (and corresponding notices that my system is
overloaded on prometheus) lately as well.

The hardware and OS and configs for my public exit haven't changed - what
has changed is that I upgraded tor itself, and added ipv6.

You appear to be running tor 0.4.6.8 on FreeBSD.

As has been previously stated on this thread the code involved
has been changed in tor 0.4.6.9 and 0.4.7.3-alpha.

So you will have to wait till FreeBSD ports ship that version.

After upgrading the tor version the overload indicator on
will disappear when it it was DNS related (there can be other reasons).

kind regards,
nusenu

路路路

--
https://nusenu.github.io
_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

1 Like