[tor-relays] Flooding of unbound via resolve attempts

GeKo · March 10, 2022, 8:33am

Hello!

As you might know we are doing regular (at the moment weekly) scans of exit nodes to find and help with misconfigurations or errors that have potentially serious effects for Tor network usability and performance. The results we got so far after over a year of scanning are roughly single digit numbers of exit relays per week having mostly DNS configuration issues (unbound crashed etc.)

However, this week we suddenly found almost 80 exit relays with malfunctioning DNS resolution[1] which was surprising. Additionally, after some of the servers got fixed the issue returned. DrWhax (thanks!) pointed us to a possible explanation twittered by the unredacted folks:

https://twitter.com/unredacted_org/status/1501458345219215363

It seems that someone (intentionally or not) is overwhelming unbound leading to DNS resolution issues for those exit operators that do run this local resolver, which we currently recommend.

We've opened a ticket[2] for further investigation, but I hope this email raises some awareness so that exit operators can keep and eye on the situation.

Feel free to add insights you have to the ticket. Additionally, I bet if someone would share how they do monitoring for such a problem on their exits then a lot of exit operators would be happily picking up that setup and the Tor network would win.

Thanks,
Georg

[1] New round of contacting operators for DNS issues and badexiting problematic relays (03/07/2022) (#197) · Issues · The Tor Project / Network Health / Team · GitLab
[2] Flood of resolve attempts overwhelms unbound on relays (#30) · Issues · The Tor Project / Network Health / Analysis · GitLab

Andreas_Kempe · March 10, 2022, 10:48pm

Hello!

As you might know we are doing regular (at the moment weekly) scans of
exit nodes to find and help with misconfigurations or errors that have
potentially serious effects for Tor network usability and performance.
The results we got so far after over a year of scanning are roughly
single digit numbers of exit relays per week having mostly DNS
configuration issues (unbound crashed etc.)

However, this week we suddenly found almost 80 exit relays with
malfunctioning DNS resolution[1] which was surprising. Additionally,
after some of the servers got fixed the issue returned. DrWhax (thanks!)
pointed us to a possible explanation twittered by the unredacted folks:

https://twitter.com/unredacted_org/status/1501458345219215363

It seems that someone (intentionally or not) is overwhelming unbound
leading to DNS resolution issues for those exit operators that do run
this local resolver, which we currently recommend.

I find it interesting that it is possible to crash/DoS unbound through
Tor circuits to an exit relay. I would have assumed other factors
would limit before unbound would. They posted some CPU graphs on the
Twitter page, but it would have been interesting to see some
requests/s numbers if someone has any to share.

We've opened a ticket[2] for further investigation, but I hope this
email raises some awareness so that exit operators can keep and eye on
the situation.

Feel free to add insights you have to the ticket. Additionally, I bet if
someone would share how they do monitoring for such a problem on their
exits then a lot of exit operators would be happily picking up that
setup and the Tor network would win.

I'm using Grafana + Prometheus + node_exporter to monitor my relays.
Grafana is a web UI for visualising data, Prometheus is a data
collector that scrapes data from node_exporter and stores it for
Grafana to fetch. node_exporter is a service that collects and
presents a bunch of data on the same format as the new Tor metrics
function.

(When I eventually get Tor daemons recent enough to get anything but
emptiness out of the metrics port, I'll add them to Premetheus for
scraping as well.)

Grafana is great and one can build dashboards that show pertinent
information and give a good overview. It is also possible to configure
alerts if metrics go outside of specified bounds. I have alerts
configured to mail me for a few statistics.

When it comes to unbound monitoring, I use unbound_exporter from the
letsencrypt project on Github[3]. It works the same way node_exporter
does, but exports unbound metrics and can be scraped by Prometheus. To
visualise the data, I use a pre-made dashboard for Grafana[4] that I
have tweaked a bit.

Cordially,
Andreas Kempe

[1] New round of contacting operators for DNS issues and badexiting problematic relays (03/07/2022) (#197) · Issues · The Tor Project / Network Health / Team · GitLab
[2] Flood of resolve attempts overwhelms unbound on relays (#30) · Issues · The Tor Project / Network Health / Analysis · GitLab

[3]: GitHub - letsencrypt/unbound_exporter: A Prometheus exporter for Unbound.
[4]: Unbound Statistics | Grafana Labs

···

On Thu, Mar 10, 2022 at 08:33:07AM +0000, Georg Koppen wrote:
_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

lists · March 11, 2022, 10:28pm

Yes, I had this rubbish on exits with unbound for a few days recently:

https://paste.debian.net/1233888/

On colocation machines¹ that are connected with 10G and more, I and other exit
operators use powerdns + dnsdist.

¹A new server is currently on its way to the data center

···

On Thursday, March 10, 2022 9:33:07 AM CET Georg Koppen wrote:

It seems that someone (intentionally or not) is overwhelming unbound
leading to DNS resolution issues for those exit operators that do run
this local resolver, which we currently recommend.

--
╰_╯ Ciao Marco!

Debian GNU/Linux

It's free software and it gives you freedom!