[tor-relays] inet_csk_bind_conflict

Hi Christopher

How many open connections do you have? (ss -s)

Do you happen to use OutboundBindAddress in your torrc?

What I think we need is for the Tor developers to include this PR in a release: https://gitlab.torproject.org/tpo/core/tor/-/merge_requests/579
Once that has happened, I think the problem should go away, as long as you run a recent enough Linux kernel that supports IP_BIND_ADDRESS_NO_PORT (since Linux 4.2).

  • Anders

fre. 2. dec. 2022 kl. 09.24 skrev Christopher Sheats <yawnbox@emeraldonion.org>:

···

Hello tor-relays,

We are using Ubuntu server currently for our exit relays. Occasionally, exit throughput will drop from ~4Gbps down to ~200Mbps and the only observable data point that we have is a significant increase in inet_csk_bind_conflict, as seen via ‘perf top’, where it will hit 85% [kernel] utilization.

A while back we thought we solved with with two /etc/sysctl.conf settings:

net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_tw_reuse = 1

However we are still experiencing this problem.

Both of our (currently, two) relay servers suffer from the same problem, at the same time. They are AMD Epyc 7402P bare-metal servers each with 96GB RAM, each has 20 exit relays on them. This issue persists after upgrading to 0.4.7.11.

Screenshots of perf top are shared here: https://digitalcourage.social/@EmeraldOnion/109440197076214023

Does anyone have experience troubleshooting and/or fixing this problem?

Cheers,


Christopher Sheats (yawnbox)
Executive Director
Emerald Onion
Signal: +1 206.739.3390
Website: https://emeraldonion.org/
Mastodon: https://digitalcourage.social/@EmeraldOnion/


tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

Hello,

Thank you for this information. After 24-hours of testing, these configurations brought Tor to a halt.

At first I started with the sysctl modifications. After a few hours with just that, there was no improvement in ~75% inet_csk_bind_conflict utilization. I then installed Torutils for both IPv4 and IPv6. After only a couple of hours, Tor dropped to below 15 Mbps across both servers (40 relays). 16 hours later, Tor dropped below 2 Mbps.

I’ve removed all of these new settings and restarted.

···


Christopher Sheats (yawnbox)
Executive Director
Emerald Onion
Signal: +1 206.739.3390
Website: https://emeraldonion.org/
Mastodon: AS 396507 (@EmeraldOnion@digitalcourage.social) - digitalcourage.social

On Dec 2, 2022, at 7:30 AM, Chris tor@wcbsecurity.com wrote:

Hi,

As I’m sure you’ve already gathered, your system is maxing out trying to
deal with all the connection requests. When inet_csk_get_port is called
and the port is found to be occupied then inet_csk_bind_conflict is
called to resolve the conflict. So in normal circumstances you shouldn’t
see it in perf top much less at 79%. There are two ways to deal with it,
and each method should be complimented by the other. One way is to try
to increase the number of ports and reduce the wait time which you have
somehow tried. I would add the following:

net.ipv4.tcp_fin_timeout = 20

net.ipv4.tcp_max_tw_buckets = 1200

net.ipv4.tcp_keepalive_time = 1200

net.ipv4.tcp_syncookies = 1

net.ipv4.tcp_max_syn_backlog = 8192

The complimentary method to the above is to lower the number of
connection requests by removing the frivolous connection requests out of
the equation using a few iptables rules.

I’m assuming the increased load you’re experiencing is due to the
current DDos attacks and I’m not sure if you’re using anything to
mitigate that but you should consider it.

You may find something useful at the following links

1

2

background

Cheers.

On 12/1/2022 3:35 PM, Christopher Sheats wrote:

Hello tor-relays,

We are using Ubuntu server currently for our exit relays.
Occasionally, exit throughput will drop from ~4Gbps down to ~200Mbps
and the only observable data point that we have is a significant
increase in inet_csk_bind_conflict, as seen via ‘perf top’, where it
will hit 85% [kernel] utilization.

A while back we thought we solved with with two /etc/sysctl.conf settings:
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_tw_reuse = 1

However we are still experiencing this problem.

Both of our (currently, two) relay servers suffer from the same
problem, at the same time. They are AMD Epyc 7402P bare-metal servers
each with 96GB RAM, each has 20 exit relays on them. This issue
persists after upgrading to 0.4.7.11.

Screenshots of perf top are shared
here: AS 396507: „We really need #help figuring out how to fix inet…“ - digitalcourage.social

Does anyone have experience troubleshooting and/or fixing this problem?

Cheers,


Christopher Sheats (yawnbox)
Executive Director
Emerald Onion
Signal: +1 206.739.3390
Website: https://emeraldonion.org/
Mastodon: AS 396507 (@EmeraldOnion@digitalcourage.social) - digitalcourage.social


tor-relays mailing list
tor-relays@lists.torproject.org
tor-relays Info Page

Sorry to hear it wasn't much help. Even though the additions I suggested
didn't help they certainly couldn't cause any harm and can't be
responsible for the drops in traffic.

As for the torutils scripts, I'm sure toralf would be able to better
investigate that but I have a feeling you have a certain set up that
might not have worked with the script. May I ask what your set up is?
Are you running your relays on separate VMs on the main system or are
you using a different set up like having all IP addresses on the same OS
and using OutboundBindAddress , routing, etc... to separate them? If I
know more, I might be able to make a script specific to your set up.

···

On 12/3/2022 2:07 PM, Christopher Sheats wrote:

Hello,

Thank you for this information. After 24-hours of testing, these
configurations brought Tor to a halt.

At first I started with the sysctl modifications. After a few hours
with just that, there was no improvement in ~75%
inet_csk_bind_conflict utilization. I then installed Torutils for both
IPv4 and IPv6. After only a couple of hours, Tor dropped to below 15
Mbps across both servers (40 relays). 16 hours later, Tor dropped
below 2 Mbps.

I've removed all of these new settings and restarted.

--
Christopher Sheats (yawnbox)
Executive Director
Emerald Onion
Signal: +1 206.739.3390
Website: https://emeraldonion.org/
Mastodon: AS 396507 (@EmeraldOnion@digitalcourage.social) - digitalcourage.social

On Dec 2, 2022, at 7:30 AM, Chris <tor@wcbsecurity.com> wrote:

Hi,

As I'm sure you've already gathered, your system is maxing out trying to
deal with all the connection requests. When inet_csk_get_port is called
and the port is found to be occupied then inet_csk_bind_conflict is
called to resolve the conflict. So in normal circumstances you shouldn't
see it in perf top much less at 79%. There are two ways to deal with it,
and each method should be complimented by the other. One way is to try
to increase the number of ports and reduce the wait time which you have
somehow tried. I would add the following:

net.ipv4.tcp_fin_timeout = 20

net.ipv4.tcp_max_tw_buckets = 1200

net.ipv4.tcp_keepalive_time = 1200

net.ipv4.tcp_syncookies = 1

net.ipv4.tcp_max_syn_backlog = 8192

The complimentary method to the above is to lower the number of
connection requests by removing the frivolous connection requests out of
the equation using a few iptables rules.

I'm assuming the increased load you're experiencing is due to the
current DDos attacks and I'm not sure if you're using anything to
mitigate that but you should consider it.

You may find something useful at the following links

[1](GitHub - Enkidu-6/tor-ddos: iptables rules for Tor relay operators to mitigate ddos)

[2](GitHub - toralf/torutils: Few tools for a Tor relay.)

[background](Provide a recommended set of iptables/nftables rules to help in case of DoS attacks (#40093) · Issues · The Tor Project / Community / Support · GitLab)

Cheers.

On 12/1/2022 3:35 PM, Christopher Sheats wrote:

Hello tor-relays,

We are using Ubuntu server currently for our exit relays.
Occasionally, exit throughput will drop from ~4Gbps down to ~200Mbps
and the only observable data point that we have is a significant
increase in inet_csk_bind_conflict, as seen via 'perf top', where it
will hit 85% [kernel] utilization.

A while back we thought we solved with with two /etc/sysctl.conf
settings:
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_tw_reuse = 1

However we are still experiencing this problem.

Both of our (currently, two) relay servers suffer from the same
problem, at the same time. They are AMD Epyc 7402P bare-metal servers
each with 96GB RAM, each has 20 exit relays on them. This issue
persists after upgrading to 0.4.7.11.

Screenshots of perf top are shared
here: AS 396507: „We really need #help figuring out how to fix inet…“ - digitalcourage.social

Does anyone have experience troubleshooting and/or fixing this problem?

Cheers,

--
Christopher Sheats (yawnbox)
Executive Director
Emerald Onion
Signal: +1 206.739.3390
Website: https://emeraldonion.org/
Mastodon: AS 396507 (@EmeraldOnion@digitalcourage.social) - digitalcourage.social

_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
tor-relays Info Page

_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

server1:~$ ss -s
Total: 454644
TCP: 465840 (estab 368011, closed 36634, orphaned 7619, timewait 11466)

Transport Total IP IPv6
RAW 0 0 0
UDP 48 48 0
TCP 429206 413815 15391
INET 429254 413863 15391
FRAG 0 0 0

81% inet_csk_bind_conflict

server2:~$ ss -s
Total: 460089
TCP: 477026 (estab 367786, closed 42817, orphaned 7456, timewait 17239)

Transport Total IP IPv6
RAW 0 0 0
UDP 71 71 0
TCP 434209 418235 15974
INET 434280 418306 15974
FRAG 1 1 0

80% inet_csk_bind_conflict

(total combined throughput at the time of measurement was ~650 Mbps symmetrical per transit provider metrics, this low throughput volume is common when inet_csk_bind_conflict is this high)

Re OutboundBindAddress - yes, for both v4 and v6

Re kernel version - 5.15.0-56-generic (jammy). Foundation for Applied Privacy recommended that we try the nightly repo which apparently includes the IP_BIND_ADDRESS_NO_PORT change. However that merge request mentions a workaround of modifying net.ipv4.ip_local_port_range, which we’ve already performed.

···


Christopher Sheats (yawnbox)
Executive Director
Emerald Onion
Signal: +1 206.739.3390
Website: https://emeraldonion.org/
Mastodon: AS 396507 (@EmeraldOnion@digitalcourage.social) - digitalcourage.social

On Dec 3, 2022, at 3:02 AM, Anders Trier Olesen anders.trier.olesen@gmail.com wrote:

Hi Christopher

How many open connections do you have? (ss -s)

Do you happen to use OutboundBindAddress in your torrc?

What I think we need is for the Tor developers to include this PR in a release: https://gitlab.torproject.org/tpo/core/tor/-/merge_requests/579
Once that has happened, I think the problem should go away, as long as you run a recent enough Linux kernel that supports IP_BIND_ADDRESS_NO_PORT (since Linux 4.2).

  • Anders

fre. 2. dec. 2022 kl. 09.24 skrev Christopher Sheats <yawnbox@emeraldonion.org>:

Hello tor-relays,

We are using Ubuntu server currently for our exit relays. Occasionally, exit throughput will drop from ~4Gbps down to ~200Mbps and the only observable data point that we have is a significant increase in inet_csk_bind_conflict, as seen via ‘perf top’, where it will hit 85% [kernel] utilization.

A while back we thought we solved with with two /etc/sysctl.conf settings:

net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_tw_reuse = 1

However we are still experiencing this problem.

Both of our (currently, two) relay servers suffer from the same problem, at the same time. They are AMD Epyc 7402P bare-metal servers each with 96GB RAM, each has 20 exit relays on them. This issue persists after upgrading to 0.4.7.11.

Screenshots of perf top are shared here: https://digitalcourage.social/@EmeraldOnion/109440197076214023

Does anyone have experience troubleshooting and/or fixing this problem?

Cheers,


Christopher Sheats (yawnbox)
Executive Director
Emerald Onion
Signal: +1 206.739.3390
Website: https://emeraldonion.org/
Mastodon: https://digitalcourage.social/@EmeraldOnion/


tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


tor-relays mailing list
tor-relays@lists.torproject.org
tor-relays Info Page

May I ask what your set up is?
Are you running your relays on separate VMs on the main system or are
you using a different set up like having all IP addresses on the same OS
and using OutboundBindAddress , routing, etc… to separate them? If I
know more, I might be able to make a script specific to your set up.

Thank you. Yes, of course.

Ubuntu server 22.04 runs on bare metal. Ansible-relayor manages 20 exit relays on each system. Netplan has each IP individually listed (sub-divided as a /25 per server from within a dedicated /24, similarly for v6 addresses). I believe an available IP is randomly picked by ansible-relayor and used statically in each torrc file.

Here is an example torrc:

ansible-relayor generated torrc configuration file

Note: manual changes will be OVERWRITTEN on the next ansible-playbook run

OfflineMasterKey 1

RunAsDaemon 0

Log notice syslog

OutboundBindAddress 23.129.64.130

SocksPort 0

User _tor-23.129.64.130_443

DataDirectory /var/lib/tor-instances/23.129.64.130_443

ORPort 23.129.64.130:443

ORPort [2620:18c:0:192::130]:443

OutboundBindAddress [2620:18c:0:192::130]

DirPort 23.129.64.130:80

Address 23.129.64.130

SyslogIdentityTag 23.129.64.130_443

ControlSocket /var/run/tor-instances/23.129.64.130_443/control GroupWritable RelaxDirModeCheck

Nickname ageis

ContactInfo url:emeraldonion.org proof:uri-rsa ciissversion:2 tech@emeraldonion.org

Sandbox 1

NoExec 1

we are an exit relay!

ExitRelay 1

IPv6Exit 1

DirPort [2620:18c:0:192::130]:80 NoAdvertise

DirPortFrontPage /etc/tor/instances/tor-exit-notice.html

ExitPolicy reject 23.129.64.128/25:,reject6 [2613:18c:0:192::]/64:,accept :,accept6 :

MyFamily

end of torrc

···


Christopher Sheats (yawnbox)
Executive Director
Emerald Onion
Signal: +1 206.739.3390
Website: https://emeraldonion.org/
Mastodon: AS 396507 (@EmeraldOnion@digitalcourage.social) - digitalcourage.social

On Dec 4, 2022, at 10:08 PM, Chris tor@wcbsecurity.com wrote:

Sorry to hear it wasn’t much help. Even though the additions I suggested
didn’t help they certainly couldn’t cause any harm and can’t be
responsible for the drops in traffic.

As for the torutils scripts, I’m sure toralf would be able to better
investigate that but I have a feeling you have a certain set up that
might not have worked with the script. May I ask what your set up is?
Are you running your relays on separate VMs on the main system or are
you using a different set up like having all IP addresses on the same OS
and using OutboundBindAddress , routing, etc… to separate them? If I
know more, I might be able to make a script specific to your set up.

On 12/3/2022 2:07 PM, Christopher Sheats wrote:

Hello,

Thank you for this information. After 24-hours of testing, these
configurations brought Tor to a halt.

At first I started with the sysctl modifications. After a few hours
with just that, there was no improvement in ~75%
inet_csk_bind_conflict utilization. I then installed Torutils for both
IPv4 and IPv6. After only a couple of hours, Tor dropped to below 15
Mbps across both servers (40 relays). 16 hours later, Tor dropped
below 2 Mbps.

I’ve removed all of these new settings and restarted.


Christopher Sheats (yawnbox)
Executive Director
Emerald Onion
Signal: +1 206.739.3390
Website: https://emeraldonion.org/
Mastodon: AS 396507 (@EmeraldOnion@digitalcourage.social) - digitalcourage.social

On Dec 2, 2022, at 7:30 AM, Chris tor@wcbsecurity.com wrote:

Hi,

As I’m sure you’ve already gathered, your system is maxing out trying to
deal with all the connection requests. When inet_csk_get_port is called
and the port is found to be occupied then inet_csk_bind_conflict is
called to resolve the conflict. So in normal circumstances you shouldn’t
see it in perf top much less at 79%. There are two ways to deal with it,
and each method should be complimented by the other. One way is to try
to increase the number of ports and reduce the wait time which you have
somehow tried. I would add the following:

net.ipv4.tcp_fin_timeout = 20

net.ipv4.tcp_max_tw_buckets = 1200

net.ipv4.tcp_keepalive_time = 1200

net.ipv4.tcp_syncookies = 1

net.ipv4.tcp_max_syn_backlog = 8192

The complimentary method to the above is to lower the number of
connection requests by removing the frivolous connection requests out of
the equation using a few iptables rules.

I’m assuming the increased load you’re experiencing is due to the
current DDos attacks and I’m not sure if you’re using anything to
mitigate that but you should consider it.

You may find something useful at the following links

1

2

background

Cheers.

On 12/1/2022 3:35 PM, Christopher Sheats wrote:

Hello tor-relays,

We are using Ubuntu server currently for our exit relays.
Occasionally, exit throughput will drop from ~4Gbps down to ~200Mbps
and the only observable data point that we have is a significant
increase in inet_csk_bind_conflict, as seen via ‘perf top’, where it
will hit 85% [kernel] utilization.

A while back we thought we solved with with two /etc/sysctl.conf
settings:
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_tw_reuse = 1

However we are still experiencing this problem.

Both of our (currently, two) relay servers suffer from the same
problem, at the same time. They are AMD Epyc 7402P bare-metal servers
each with 96GB RAM, each has 20 exit relays on them. This issue
persists after upgrading to 0.4.7.11.

Screenshots of perf top are shared
here: AS 396507: „We really need #help figuring out how to fix inet…“ - digitalcourage.social

Does anyone have experience troubleshooting and/or fixing this problem?

Cheers,


Christopher Sheats (yawnbox)
Executive Director
Emerald Onion
Signal: +1 206.739.3390
Website: https://emeraldonion.org/
Mastodon: AS 396507 (@EmeraldOnion@digitalcourage.social) - digitalcourage.social


tor-relays mailing list
tor-relays@lists.torproject.org
tor-relays Info Page

Excellent. Thank you.

Yes a blanket iotables rule is not going to work well in this set up as
it pools all connections to all IP addresses into one. So if we accept 4
connections to port 443, a blanket iptables rules accepts 4 connections
to all IP addresses combined and drops everything else and of course
that brings your server to a halt.

In another thread in this mailing list, they had the same situation and
I put a script together yesterday that you're welcome to try if you
wish. Not sure if they've tried it yet or what the result has been. But
the script is set up to apply the rules to two IP addresses at a time
and leave the rest alone. So you can apply to two addresses on your
server, assess the result and then either expand to the rest or stop
altogether.

The script makes a back up of your existing iptables rules. All you have
to do is restore it and everything goes back to how it was without
having to reboot. It also specifically uses the mangle table and
PREROUTING and it won't interfere with your existing rules. That should
reduce the number of used ports as well. Flushing the mangle table will
also get rid of these rules and you're back to how it was before.

You can get it here:

https://raw.githubusercontent.com/Enkidu-6/tor-ddos/dev/multiple/multi-addr.sh

Simply choose two of your IP addresses and the ORPort for each and run
the script.

If it does what you expect it to do, all you have to do is to change the
IP Addresses and run the script again until all your addresses are
covered. Please save the iptables backup somewhere else as the second
time you run the script, the original back up will be overwritten.

If one of your IP addresses has two ORPorts, the above script won't work
and you should use the script below:

https://raw.githubusercontent.com/Enkidu-6/tor-ddos/dev/multiple/two-or.sh

Best of luck and I hope this helps.

···

On 12/5/2022 3:48 PM, Christopher Sheats wrote:

May I ask what your set up is?
Are you running your relays on separate VMs on the main system or are
you using a different set up like having all IP addresses on the same OS
and using OutboundBindAddress , routing, etc... to separate them? If I
know more, I might be able to make a script specific to your set up.

Thank you. Yes, of course.

Ubuntu server 22.04 runs on bare metal. Ansible-relayor manages 20
exit relays on each system. Netplan has each IP individually listed
(sub-divided as a /25 per server from within a dedicated /24,
similarly for v6 addresses). I believe an available IP is randomly
picked by ansible-relayor and used statically in each torrc file.

Here is an example torrc:

# ansible-relayor generated torrc configuration file

# Note: manual changes will be OVERWRITTEN on the next
ansible-playbook run

OfflineMasterKey 1

RunAsDaemon 0

Log notice syslog

OutboundBindAddress 23.129.64.130

SocksPort 0

User _tor-23.129.64.130_443

DataDirectory /var/lib/tor-instances/23.129.64.130_443

ORPort 23.129.64.130:443

ORPort [2620:18c:0:192::130]:443

OutboundBindAddress [2620:18c:0:192::130]

DirPort 23.129.64.130:80

Address 23.129.64.130

SyslogIdentityTag 23.129.64.130_443

ControlSocket /var/run/tor-instances/23.129.64.130_443/control
GroupWritable RelaxDirModeCheck

Nickname ageis

ContactInfo url:emeraldonion.org proof:uri-rsa ciissversion:2
tech@emeraldonion.org

Sandbox 1

NoExec 1

# we are an exit relay!

ExitRelay 1

IPv6Exit 1

DirPort [2620:18c:0:192::130]:80 NoAdvertise

DirPortFrontPage /etc/tor/instances/tor-exit-notice.html

ExitPolicy reject 23.129.64.128/25:*,reject6
[2613:18c:0:192::]/64:*,accept *:*,accept6 *:*

MyFamily <snip>

# end of torrc

--
Christopher Sheats (yawnbox)
Executive Director
Emerald Onion
Signal: +1 206.739.3390
Website: https://emeraldonion.org/
Mastodon: AS 396507 (@EmeraldOnion@digitalcourage.social) - digitalcourage.social

On Dec 4, 2022, at 10:08 PM, Chris <tor@wcbsecurity.com> wrote:

Sorry to hear it wasn't much help. Even though the additions I suggested
didn't help they certainly couldn't cause any harm and can't be
responsible for the drops in traffic.

As for the torutils scripts, I'm sure toralf would be able to better
investigate that but I have a feeling you have a certain set up that
might not have worked with the script. May I ask what your set up is?
Are you running your relays on separate VMs on the main system or are
you using a different set up like having all IP addresses on the same OS
and using OutboundBindAddress , routing, etc... to separate them? If I
know more, I might be able to make a script specific to your set up.

On 12/3/2022 2:07 PM, Christopher Sheats wrote:

Hello,

Thank you for this information. After 24-hours of testing, these
configurations brought Tor to a halt.

At first I started with the sysctl modifications. After a few hours
with just that, there was no improvement in ~75%
inet_csk_bind_conflict utilization. I then installed Torutils for both
IPv4 and IPv6. After only a couple of hours, Tor dropped to below 15
Mbps across both servers (40 relays). 16 hours later, Tor dropped
below 2 Mbps.

I've removed all of these new settings and restarted.

--
Christopher Sheats (yawnbox)
Executive Director
Emerald Onion
Signal: +1 206.739.3390
Website: https://emeraldonion.org/
Mastodon: AS 396507 (@EmeraldOnion@digitalcourage.social) - digitalcourage.social

On Dec 2, 2022, at 7:30 AM, Chris <tor@wcbsecurity.com> wrote:

Hi,

As I'm sure you've already gathered, your system is maxing out
trying to
deal with all the connection requests. When inet_csk_get_port is called
and the port is found to be occupied then inet_csk_bind_conflict is
called to resolve the conflict. So in normal circumstances you
shouldn't
see it in perf top much less at 79%. There are two ways to deal
with it,
and each method should be complimented by the other. One way is to try
to increase the number of ports and reduce the wait time which you have
somehow tried. I would add the following:

net.ipv4.tcp_fin_timeout = 20

net.ipv4.tcp_max_tw_buckets = 1200

net.ipv4.tcp_keepalive_time = 1200

net.ipv4.tcp_syncookies = 1

net.ipv4.tcp_max_syn_backlog = 8192

The complimentary method to the above is to lower the number of
connection requests by removing the frivolous connection requests
out of
the equation using a few iptables rules.

I'm assuming the increased load you're experiencing is due to the
current DDos attacks and I'm not sure if you're using anything to
mitigate that but you should consider it.

You may find something useful at the following links

[1](GitHub - Enkidu-6/tor-ddos: iptables rules for Tor relay operators to mitigate ddos)

[2](GitHub - toralf/torutils: Few tools for a Tor relay.)

[background](Provide a recommended set of iptables/nftables rules to help in case of DoS attacks (#40093) · Issues · The Tor Project / Community / Support · GitLab)

Cheers.

On 12/1/2022 3:35 PM, Christopher Sheats wrote:

Hello tor-relays,

We are using Ubuntu server currently for our exit relays.
Occasionally, exit throughput will drop from ~4Gbps down to ~200Mbps
and the only observable data point that we have is a significant
increase in inet_csk_bind_conflict, as seen via 'perf top', where it
will hit 85% [kernel] utilization.

A while back we thought we solved with with two /etc/sysctl.conf
settings:
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_tw_reuse = 1

However we are still experiencing this problem.

Both of our (currently, two) relay servers suffer from the same
problem, at the same time. They are AMD Epyc 7402P bare-metal servers
each with 96GB RAM, each has 20 exit relays on them. This issue
persists after upgrading to 0.4.7.11.

Screenshots of perf top are shared
here: AS 396507: „We really need #help figuring out how to fix inet…“ - digitalcourage.social

Does anyone have experience troubleshooting and/or fixing this
problem?

Cheers,

--
Christopher Sheats (yawnbox)
Executive Director
Emerald Onion
Signal: +1 206.739.3390
Website: https://emeraldonion.org/
Mastodon: AS 396507 (@EmeraldOnion@digitalcourage.social) - digitalcourage.social

_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
tor-relays Info Page

_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

Like I wrote in [1], I think it would be interesting to hear if the
patch from pseudonymisaTor in ticket #26646[2] would be of any help in
the given situation. The patch allows an exit operator to specify a
range of IP addresses for binding purposes for outbound connections. I
would think this could split the load wasted on trying to resolve port
conflicts in the kernel amongst the set of IP's you have available for
outbound connections.

All the best,
Alex.

[1]: https://mastodon.social/@ahf/109382411984106226
[2]: add support for multiple OutboundBindAddressExit IP(ranges) (#26646) · Issues · The Tor Project / Core / Tor · GitLab

···

On 2022/12/01 20:35, Christopher Sheats wrote:

Does anyone have experience troubleshooting and/or fixing this problem?

--
Alexander Færøy
_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

> Does anyone have experience troubleshooting and/or fixing this problem?

Like I wrote in [1], I think it would be interesting to hear if the
patch from pseudonymisaTor in ticket #26646[2] would be of any help in
the given situation. The patch allows an exit operator to specify a
range of IP addresses for binding purposes for outbound connections. I
would think this could split the load wasted on trying to resolve port
conflicts in the kernel amongst the set of IP's you have available for
outbound connections.

This sounds similar to a problem we faced with the main Snowflake
bridge. After usage passed a certain threshold, we started getting
constant EADDRNOTAVAIL, not on the outgoing connections to middle nodes,
but on the many localhost TCP connections used by the pluggable
transports model.

Long story short, the only mitigation that worked for us was to bind
sockets to an address (with port number unspecified, and with
IP_BIND_ADDRESS_NO_PORT *unset*) before connecting them, and use
different 127.0.0.0/8 addresses or ranges of addresses in different
segments of the communication chain.

IP_BIND_ADDRESS_NO_PORT was mentioned in another part of the thread
([tor-relays] inet_csk_bind_conflict).
For us, this bind option *did not help* and in fact we had to apply a
workaround for Haproxy, which has IP_BIND_ADDRESS_NO_PORT hardcoded.
*Why* that should be the case is a mystery to me, as is why it is true
that bind-before-connect avoids EADDRNOTAVAIL even when the address
manually bound to is the very same address the kernel would have
automatically assigned. I even spent some time reading the Linux 5.10
source code trying to make sense of it. In the source code I found, or
at least think I found, code paths for the behvior I observed; but the
behavior seems to go against how bind and IP_BIND_ADDRESS_NO_PORT are
documented to work.

···

On Fri, Dec 09, 2022 at 09:47:07AM +0000, Alexander Færøy wrote:

On 2022/12/01 20:35, Christopher Sheats wrote:

Although my understanding of what Linux is doing is very imperfect, my
understanding is that both of these questions have the same answer:
port number assignment in `connect` when called on a socket not yet
bound to a port works differently than in `bind` when called with a
port number of 0. In case (1), the socket is not bound to a port
because you haven't even called `bind`. In case (2), the socket is not
bound to a port because haproxy sets the `IP_BIND_ADDRESS_NO_PORT`
sockopt before calling `bind`. When you call `bind` *without*
`IP_BIND_ADDRESS_NO_PORT`, it causes the port number to be bound
before calling `connect`, which avoids the code path in `connect` that
results in `EADDRNOTAVAIL`.

I am confused by these results, which are contrary to my understanding
of what `IP_BIND_ADDRESS_NO_PORT` is supposed to do, which is
precisely to avoid the problem of source address port exhaustion by
deferring the port number assignment until the time of `connect`, when
additional information about the destination address is available. But
it's demonstrable that binding to a source port before calling
`connect` avoids `EADDRNOTAVAIL` errors in our use cases, whatever the
cause may be.

_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

Hi again

I took another look at this problem, and now I’m even more convinced that what we really need is IP_BIND_ADDRESS_NO_PORT. Here’s why.

If torrc OutboundBindAddress is configured, tor calls bind(2) on every outgoing connection:

https://gitlab.torproject.org/tpo/core/tor/-/blob/tor-0.4.7.12/src/core/mainloop/connection.c#L2245
with sockaddr_in.sin_port set to 0 on #L2438.

The kernel doesn’t know that we’ll not be using this socket for listen(2), so the kernel attempts to find an unused local two-tuple (according to 1. Actually a three-tuple: <protocol, source ip, source port>):

The bind syscall is handled by inet_bind:

https://elixir.bootlin.com/linux/v5.15.56/source/net/ipv4/af_inet.c#L438
which calls __inet_bind that in turn calls sk->sk_prot->get_port on #L531 (notice the if on #L529).

get_port is implemented by inet_csk_get_port in inet_connection_sock.c:
https://elixir.bootlin.com/linux/v5.15.56/source/net/ipv4/inet_connection_sock.c#L362
On #L375, we call inet_csk_find_open_port (defined on #L190) to find a free port.

inet_csk_find_open_port gets the local port range on #L206 (i.e net.ipv4.ip_local_port_range), selects a random starting point (L#222), and loops through all the ports until it finds one that is free (#L230). For every port candidate, if it is already in use (#L240) it calls inet_csk_bind_conflict (#L241), which is defined on #L133. As far as I understand, it is inet_csk_bind_conflict’s job is to determine if it is safe to bind to the port anyway (ex, the existing connection could be in TCP_TIME_WAIT and SO_REUSEPORT set on the socket). This is where your server spend so much time. Increasing net.ipv4.ip_local_port_range doesn’t solve the problem, but makes it more likely to find a port that is free.

Lets trace back to the “if” in __inet_bind on #L529:
https://elixir.bootlin.com/linux/v5.15.56/source/net/ipv4/af_inet.c#L529
Since we call bind with sockaddr_in.sin_port set to 0, snum is 0, and we can avoid the whole call chain by setting inet->bind_address_no_port to 1. I.e this patch:
https://gitlab.torproject.org/tpo/core/tor/-/merge_requests/579/diffs?commit_id=b65ffa6f06b2d7bc313e0780f3d76a8acb499ac9#a65580094313324792dd24fed1904263b271abd5_2227_2230
That should allow the kernel to use already in use src ports as long as the TCP 4-tuple is unique.

Please include it in the next tor release! :slight_smile:

  • Anders
···

On Fri, Dec 9, 2022 at 10:47 AM Alexander Færøy <ahf@torproject.org> wrote:

On 2022/12/01 20:35, Christopher Sheats wrote:

Does anyone have experience troubleshooting and/or fixing this problem?

Like I wrote in 1, I think it would be interesting to hear if the
patch from pseudonymisaTor in ticket #266462 would be of any help in
the given situation. The patch allows an exit operator to specify a
range of IP addresses for binding purposes for outbound connections. I
would think this could split the load wasted on trying to resolve port
conflicts in the kernel amongst the set of IP’s you have available for
outbound connections.

All the best,
Alex.


Alexander Færøy


tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

Hi David

IP_BIND_ADDRESS_NO_PORT did not fix your somewhat similar problem in your Haproxy setup, because all the connections are to the same dst tuple <ip, port> (i.e 127.0.0.1:ExtORPort).
The connect() system call is looking for a unique 5-tuple <protocol, srcip, srcport, dstip, dstport>. In the Haproxy setup, the only free variable is srcport <tcp, 127.0.0.1, srcport, 127.0.0.1, ExtORPort>, so toggling IP_BIND_ADDRESS_NO_PORT makes no difference.

The following should help (unless found a bug in Linux):

  1. Let tor listen on a bunch of different ExtORPort

  2. Let tor listen on a bunch of ips for the ExtORPort (so we have #ExtORPort * #ExtOrPortListenIPs unique combinations)

  3. Connect from different src ips (what you already implemented)

  4. sysctl -w net.ipv4.ip_local_port_range=“1024 65535”

For 1 and 2 to make a difference, if you do a 3 (i.e bind before connect), you need IP_BIND_ADDRESS_NO_PORT enabled on the socket.

Tor relays already connect to many different dstip:dstport pairs, so enabling IP_BIND_ADDRESS_NO_PORT should solve our problem.

I rest my case :wink:

Best regards
Anders Trier Olesen

···

On Sat, Dec 10, 2022 at 5:41 AM David Fifield <david@bamsoftware.com> wrote:

On Fri, Dec 09, 2022 at 09:47:07AM +0000, Alexander Færøy wrote:

On 2022/12/01 20:35, Christopher Sheats wrote:

Does anyone have experience troubleshooting and/or fixing this problem?

Like I wrote in [1], I think it would be interesting to hear if the
patch from pseudonymisaTor in ticket #26646[2] would be of any help in
the given situation. The patch allows an exit operator to specify a
range of IP addresses for binding purposes for outbound connections. I
would think this could split the load wasted on trying to resolve port
conflicts in the kernel amongst the set of IP’s you have available for
outbound connections.

This sounds similar to a problem we faced with the main Snowflake
bridge. After usage passed a certain threshold, we started getting
constant EADDRNOTAVAIL, not on the outgoing connections to middle nodes,
but on the many localhost TCP connections used by the pluggable
transports model.

https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40198
https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40201

Long story short, the only mitigation that worked for us was to bind
sockets to an address (with port number unspecified, and with
IP_BIND_ADDRESS_NO_PORT unset) before connecting them, and use
different 127.0.0.0/8 addresses or ranges of addresses in different
segments of the communication chain.

https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/merge_requests/120
https://gitlab.torproject.org/dcf/extor-static-cookie/-/commit/a5c7a038a71aec1ff78d1b15888f1c75b66639cd

IP_BIND_ADDRESS_NO_PORT was mentioned in another part of the thread
(https://lists.torproject.org/pipermail/tor-relays/2022-December/020895.html).
For us, this bind option did not help and in fact we had to apply a
workaround for Haproxy, which has IP_BIND_ADDRESS_NO_PORT hardcoded.
Why that should be the case is a mystery to me, as is why it is true
that bind-before-connect avoids EADDRNOTAVAIL even when the address
manually bound to is the very same address the kernel would have
automatically assigned. I even spent some time reading the Linux 5.10
source code trying to make sense of it. In the source code I found, or
at least think I found, code paths for the behvior I observed; but the
behavior seems to go against how bind and IP_BIND_ADDRESS_NO_PORT are
documented to work.

https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40201#note_2839472

Although my understanding of what Linux is doing is very imperfect, my
understanding is that both of these questions have the same answer:
port number assignment in connect when called on a socket not yet
bound to a port works differently than in bind when called with a
port number of 0. In case (1), the socket is not bound to a port
because you haven’t even called bind. In case (2), the socket is not
bound to a port because haproxy sets the IP_BIND_ADDRESS_NO_PORT
sockopt before calling bind. When you call bind without
IP_BIND_ADDRESS_NO_PORT, it causes the port number to be bound
before calling connect, which avoids the code path in connect that
results in EADDRNOTAVAIL.

I am confused by these results, which are contrary to my understanding
of what IP_BIND_ADDRESS_NO_PORT is supposed to do, which is
precisely to avoid the problem of source address port exhaustion by
deferring the port number assignment until the time of connect, when
additional information about the destination address is available. But
it’s demonstrable that binding to a source port before calling
connect avoids EADDRNOTAVAIL errors in our use cases, whatever the
cause may be.


tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

No—that is what I thought too, at first, but experimentally it is not
the case. Removing the IP_BIND_ADDRESS_NO_PORT option from Haproxy and
*doing nothing else* is sufficient to resolve the problem. Haproxy ends
up binding to the same address it would have bound to with
IP_BIND_ADDRESS_NO_PORT, and there are the same number of 5-tuples to
the same endpoints, but the EADDRNOTAVAIL errors stop. It is
counterintuitive and unexpected, which why I took the trouble to write
it up.

As I wrote at #40201, there are divergent code paths for connect in the
kernel when the port is already bound versus when it is not bound. It's
not as simple as filling in blanks in a 5-tuple in otherwise identical
code paths.

Anyway, it is not true that all connections go to the same (IP, port).
(There would be no need to use a load balancer if that were the case.)
At the time, we were running 12 tor processes with 12 different
ExtORPorts (each ExtORPort on a different IP address, even: 127.0.3.1,
127.0.3.2, etc.). We started to have EADDRNOTAVAIL problems at around
3000 connections per ExtORPort, which is far too few to have exhausted
the 5-tuple space. Please check the discussion at #40201 again, because
I documented this detail there.

I urge you to run an experient yourself, if these observations are not
what you expect. I was surprised, as well.

···

On Sat, Dec 10, 2022 at 09:59:14AM +0100, Anders Trier Olesen wrote:

IP_BIND_ADDRESS_NO_PORT did not fix your somewhat similar problem in your
Haproxy setup, because all the connections are to the same dst tuple <ip, port>
(i.e 127.0.0.1:ExtORPort).
The connect() system call is looking for a unique 5-tuple <protocol, srcip,
srcport, dstip, dstport>. In the Haproxy setup, the only free variable is
srcport <tcp, 127.0.0.1, srcport, 127.0.0.1, ExtORPort>, so toggling
IP_BIND_ADDRESS_NO_PORT makes no difference.

_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

Also see this patch, which introduces net.ipv4.ip_autobind_reuse:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=4b01a9674231a97553a55456d883f584e948a78d

Enabling net.ipv4.ip_autobind_reuse allows the kernel to bind SO_REUSEADDR enabled sockets (which I think they are in tor) to the same <addr, port> only when all ephemeral ports are exhausted. (So it should fix the “resource exhausted” bugs, but we’ll still spend way too much time in the kernel looking for free ports, before giving up and checking if net.ipv4.ip_autobind_reuse is toggled)

It is only safe to use when you know that you’ll not have tons of connections to the same dstip:dstport (so not safe to use in the haproxy setup), which is why it “should only be set by experts”, and they suggest using IP_BIND_ADDRESS_NO_PORT instead:

ip_autobind_reuse - BOOLEAN
By default, bind() does not select the ports automatically even if
the new socket and all sockets bound to the port have SO_REUSEADDR.
ip_autobind_reuse allows bind() to reuse the port and this is useful
when you use bind()+connect(), but may break some applications.
The preferred solution is to use IP_BIND_ADDRESS_NO_PORT and this
option (i.e ip_autobind_reuse) should only be set by experts.
Default: 0

I’ve enabled sysctl -w net.ipv4.ip_autobind_reuse=1 on the dotsrc exits for now, while we wait for IP_BIND_ADDRESS_NO_PORT.

  • Anders
···

On Sat, Dec 10, 2022 at 9:59 AM Anders Trier Olesen <anders.trier.olesen@gmail.com> wrote:

Hi David

IP_BIND_ADDRESS_NO_PORT did not fix your somewhat similar problem in your Haproxy setup, because all the connections are to the same dst tuple <ip, port> (i.e 127.0.0.1:ExtORPort).
The connect() system call is looking for a unique 5-tuple <protocol, srcip, srcport, dstip, dstport>. In the Haproxy setup, the only free variable is srcport <tcp, 127.0.0.1, srcport, 127.0.0.1, ExtORPort>, so toggling IP_BIND_ADDRESS_NO_PORT makes no difference.

The following should help (unless found a bug in Linux):

  1. Let tor listen on a bunch of different ExtORPort

  2. Let tor listen on a bunch of ips for the ExtORPort (so we have #ExtORPort * #ExtOrPortListenIPs unique combinations)

  3. Connect from different src ips (what you already implemented)

  4. sysctl -w net.ipv4.ip_local_port_range=“1024 65535”

For 1 and 2 to make a difference, if you do a 3 (i.e bind before connect), you need IP_BIND_ADDRESS_NO_PORT enabled on the socket.

Tor relays already connect to many different dstip:dstport pairs, so enabling IP_BIND_ADDRESS_NO_PORT should solve our problem.

I rest my case :wink:

Best regards
Anders Trier Olesen

On Sat, Dec 10, 2022 at 5:41 AM David Fifield <david@bamsoftware.com> wrote:

On Fri, Dec 09, 2022 at 09:47:07AM +0000, Alexander Færøy wrote:

On 2022/12/01 20:35, Christopher Sheats wrote:

Does anyone have experience troubleshooting and/or fixing this problem?

Like I wrote in [1], I think it would be interesting to hear if the
patch from pseudonymisaTor in ticket #26646[2] would be of any help in
the given situation. The patch allows an exit operator to specify a
range of IP addresses for binding purposes for outbound connections. I
would think this could split the load wasted on trying to resolve port
conflicts in the kernel amongst the set of IP’s you have available for
outbound connections.

This sounds similar to a problem we faced with the main Snowflake
bridge. After usage passed a certain threshold, we started getting
constant EADDRNOTAVAIL, not on the outgoing connections to middle nodes,
but on the many localhost TCP connections used by the pluggable
transports model.

https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40198
https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40201

Long story short, the only mitigation that worked for us was to bind
sockets to an address (with port number unspecified, and with
IP_BIND_ADDRESS_NO_PORT unset) before connecting them, and use
different 127.0.0.0/8 addresses or ranges of addresses in different
segments of the communication chain.

https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/merge_requests/120
https://gitlab.torproject.org/dcf/extor-static-cookie/-/commit/a5c7a038a71aec1ff78d1b15888f1c75b66639cd

IP_BIND_ADDRESS_NO_PORT was mentioned in another part of the thread
(https://lists.torproject.org/pipermail/tor-relays/2022-December/020895.html).
For us, this bind option did not help and in fact we had to apply a
workaround for Haproxy, which has IP_BIND_ADDRESS_NO_PORT hardcoded.
Why that should be the case is a mystery to me, as is why it is true
that bind-before-connect avoids EADDRNOTAVAIL even when the address
manually bound to is the very same address the kernel would have
automatically assigned. I even spent some time reading the Linux 5.10
source code trying to make sense of it. In the source code I found, or
at least think I found, code paths for the behvior I observed; but the
behavior seems to go against how bind and IP_BIND_ADDRESS_NO_PORT are
documented to work.

https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40201#note_2839472

Although my understanding of what Linux is doing is very imperfect, my
understanding is that both of these questions have the same answer:
port number assignment in connect when called on a socket not yet
bound to a port works differently than in bind when called with a
port number of 0. In case (1), the socket is not bound to a port
because you haven’t even called bind. In case (2), the socket is not
bound to a port because haproxy sets the IP_BIND_ADDRESS_NO_PORT
sockopt before calling bind. When you call bind without
IP_BIND_ADDRESS_NO_PORT, it causes the port number to be bound
before calling connect, which avoids the code path in connect that
results in EADDRNOTAVAIL.

I am confused by these results, which are contrary to my understanding
of what IP_BIND_ADDRESS_NO_PORT is supposed to do, which is
precisely to avoid the problem of source address port exhaustion by
deferring the port number assignment until the time of connect, when
additional information about the destination address is available. But
it’s demonstrable that binding to a source port before calling
connect avoids EADDRNOTAVAIL errors in our use cases, whatever the
cause may be.


tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

I urge you to run an experient yourself, if these observations are not

what you expect. I was surprised, as well.
Very interesting. I’ll run some tests.

We do agree that IP_BIND_ADDRESS_NO_PORT should fix OPs’ problem, right? With it enabled, there’s no path to inet_csk_bind_conflict which is where OPs CPU spend too much time.

  • Anders
···

On Sat, Dec 10, 2022 at 4:23 PM David Fifield <david@bamsoftware.com> wrote:

On Sat, Dec 10, 2022 at 09:59:14AM +0100, Anders Trier Olesen wrote:

IP_BIND_ADDRESS_NO_PORT did not fix your somewhat similar problem in your
Haproxy setup, because all the connections are to the same dst tuple <ip, port>
(i.e 127.0.0.1:ExtORPort).
The connect() system call is looking for a unique 5-tuple <protocol, srcip,
srcport, dstip, dstport>. In the Haproxy setup, the only free variable is
srcport <tcp, 127.0.0.1, srcport, 127.0.0.1, ExtORPort>, so toggling
IP_BIND_ADDRESS_NO_PORT makes no difference.

No—that is what I thought too, at first, but experimentally it is not
the case. Removing the IP_BIND_ADDRESS_NO_PORT option from Haproxy and
doing nothing else is sufficient to resolve the problem. Haproxy ends
up binding to the same address it would have bound to with
IP_BIND_ADDRESS_NO_PORT, and there are the same number of 5-tuples to
the same endpoints, but the EADDRNOTAVAIL errors stop. It is
counterintuitive and unexpected, which why I took the trouble to write
it up.

As I wrote at #40201, there are divergent code paths for connect in the
kernel when the port is already bound versus when it is not bound. It’s
not as simple as filling in blanks in a 5-tuple in otherwise identical
code paths.

Anyway, it is not true that all connections go to the same (IP, port).
(There would be no need to use a load balancer if that were the case.)
At the time, we were running 12 tor processes with 12 different
ExtORPorts (each ExtORPort on a different IP address, even: 127.0.3.1,
127.0.3.2, etc.). We started to have EADDRNOTAVAIL problems at around
3000 connections per ExtORPort, which is far too few to have exhausted
the 5-tuple space. Please check the discussion at #40201 again, because
I documented this detail there.

I urge you to run an experient yourself, if these observations are not
what you expect. I was surprised, as well.


tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

I wrote some tests[1] which showed behaviour I did not expect. IP_BIND_ADDRESS_NO_PORT seems to work as it should, but calling bind without it enabled turns out to be even worse than I thought.

This is what I think is happening: A successful bind() on a socket without IP_BIND_ADDRESS_NO_PORT enabled, with or without an explicit port configured, makes the assigned (or supplied) port unavailable for new connect()s (on different sockets), no matter the destination. I.e if you exhaust the entire net.ipv4.ip_local_port_range with bind() (no matter what IP you bind to!), connect() will stop working - no matter what IP you attempt to connect to. You can work around this by manually doing a bind() (with or without an explicit port, but without IP_BIND_ADDRESS_NO_PORT) on the socket before connect().

$ uname -a
Linux laptop 5.15.0-56-generic #62-Ubuntu SMP Tue Nov 22 19:54:14 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

sysctl -w net.ipv4.ip_local_port_range=“40000 40100”

$ cd server && cargo run &
Version used: https://github.com/AndersTrier/IP_BIND_ADDRESS_NO_PORT_tests/blob/e74b09f680bb01a0078fe7e043e786c111103647/connect.py
$ …/connect.py
Raised RLIMIT_NOFILE softlimit from 1024 to 200000
Select test (1-6): 2

Test 2

Error on bind: [Errno 98] Address already in use
Made 101 connections. Expected to be around 101.
Select test (1-6): 1

Test 1

Error on connect: [Errno 99] Cannot assign requested address
Made 0 connections. Expected to be around 101.
Select test (1-6): 3

Test 3

Error on bind: [Errno 98] Address already in use
Made 200 connections. Expected to be around 202.

What blows my mind is that after running test2, you cannot connect to anything without manually doing a bind() beforehand (as shown by test1 and test3 above)! This also means that after running test2, software like ssh stops working:
$ ssh -v mirrors.dotsrc.org
[…]
debug1: connect to address 130.225.254.116 port 22: Cannot assign requested address

When using IP_BIND_ADDRESS_NO_PORT, we don’t have this problem (1 5 6 can be run in any order):
$ ./connect.py
Raised RLIMIT_NOFILE softlimit from 1024 to 200000
Select test (1-6): 5

Test 5

Error on connect: [Errno 99] Cannot assign requested address
Made 90 connections. Expected to be around 101.
Select test (1-6): 6

Test 6

Error on connect: [Errno 99] Cannot assign requested address
Made 180 connections. Expected to be around 202.
Select test (1-6): 1

Test 1

Error on connect: [Errno 99] Cannot assign requested address
Made 90 connections. Expected to be around 101.

Removing the IP_BIND_ADDRESS_NO_PORT option from Haproxy and

doing nothing else is sufficient to resolve the problem.

Maybe there are other processes on the same host which calls bind() without IP_BIND_ADDRESS_NO_PORT, and blocks the ports? E.g OutboundBindAddress or similar in torrc?

[1] https://github.com/AndersTrier/IP_BIND_ADDRESS_NO_PORT_tests

···

On Sat, Dec 10, 2022 at 7:15 PM Anders Trier Olesen <anders.trier.olesen@gmail.com> wrote:

I urge you to run an experient yourself, if these observations are not

what you expect. I was surprised, as well.
Very interesting. I’ll run some tests.

We do agree that IP_BIND_ADDRESS_NO_PORT should fix OPs’ problem, right? With it enabled, there’s no path to inet_csk_bind_conflict which is where OPs CPU spend too much time.

  • Anders

On Sat, Dec 10, 2022 at 4:23 PM David Fifield <david@bamsoftware.com> wrote:

On Sat, Dec 10, 2022 at 09:59:14AM +0100, Anders Trier Olesen wrote:

IP_BIND_ADDRESS_NO_PORT did not fix your somewhat similar problem in your
Haproxy setup, because all the connections are to the same dst tuple <ip, port>
(i.e 127.0.0.1:ExtORPort).
The connect() system call is looking for a unique 5-tuple <protocol, srcip,
srcport, dstip, dstport>. In the Haproxy setup, the only free variable is
srcport <tcp, 127.0.0.1, srcport, 127.0.0.1, ExtORPort>, so toggling
IP_BIND_ADDRESS_NO_PORT makes no difference.

No—that is what I thought too, at first, but experimentally it is not
the case. Removing the IP_BIND_ADDRESS_NO_PORT option from Haproxy and
doing nothing else is sufficient to resolve the problem. Haproxy ends
up binding to the same address it would have bound to with
IP_BIND_ADDRESS_NO_PORT, and there are the same number of 5-tuples to
the same endpoints, but the EADDRNOTAVAIL errors stop. It is
counterintuitive and unexpected, which why I took the trouble to write
it up.

As I wrote at #40201, there are divergent code paths for connect in the
kernel when the port is already bound versus when it is not bound. It’s
not as simple as filling in blanks in a 5-tuple in otherwise identical
code paths.

Anyway, it is not true that all connections go to the same (IP, port).
(There would be no need to use a load balancer if that were the case.)
At the time, we were running 12 tor processes with 12 different
ExtORPorts (each ExtORPort on a different IP address, even: 127.0.3.1,
127.0.3.2, etc.). We started to have EADDRNOTAVAIL problems at around
3000 connections per ExtORPort, which is far too few to have exhausted
the 5-tuple space. Please check the discussion at #40201 again, because
I documented this detail there.

I urge you to run an experient yourself, if these observations are not
what you expect. I was surprised, as well.


tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

I wonder if IP_BIND_ADDRESS_NO_PORT is better implemented in Nginx?

https://www.nginx.com/blog/overcoming-ephemeral-port-exhaustion-nginx-plus/

Respectfully,

Gary

···

On Saturday, December 10, 2022, 7:23:28 AM PST, David Fifield david@bamsoftware.com wrote:

On Sat, Dec 10, 2022 at 09:59:14AM +0100, Anders Trier Olesen wrote:

IP_BIND_ADDRESS_NO_PORT did not fix your somewhat similar problem in your
Haproxy setup, because all the connections are to the same dst tuple <ip, port>
(i.e 127.0.0.1:ExtORPort).
The connect() system call is looking for a unique 5-tuple <protocol, srcip,
srcport, dstip, dstport>. In the Haproxy setup, the only free variable is
srcport <tcp, 127.0.0.1, srcport, 127.0.0.1, ExtORPort>, so toggling
IP_BIND_ADDRESS_NO_PORT makes no difference.

No—that is what I thought too, at first, but experimentally it is not
the case. Removing the IP_BIND_ADDRESS_NO_PORT option from Haproxy and
doing nothing else is sufficient to resolve the problem. Haproxy ends
up binding to the same address it would have bound to with
IP_BIND_ADDRESS_NO_PORT, and there are the same number of 5-tuples to
the same endpoints, but the EADDRNOTAVAIL errors stop. It is
counterintuitive and unexpected, which why I took the trouble to write
it up.

As I wrote at #40201, there are divergent code paths for connect in the
kernel when the port is already bound versus when it is not bound. It’s
not as simple as filling in blanks in a 5-tuple in otherwise identical
code paths.

Anyway, it is not true that all connections go to the same (IP, port).
(There would be no need to use a load balancer if that were the case.)
At the time, we were running 12 tor processes with 12 different
ExtORPorts (each ExtORPort on a different IP address, even: 127.0.3.1,
127.0.3.2, etc.). We started to have EADDRNOTAVAIL problems at around
3000 connections per ExtORPort, which is far too few to have exhausted
the 5-tuple space. Please check the discussion at #40201 again, because
I documented this detail there.

I urge you to run an experient yourself, if these observations are not
what you expect. I was surprised, as well.


This Message Originated by the Sun.
iBigBlue 63W Solar Array (~12 Hour Charge)

  • 2 x Charmast 26800mAh Power Banks
    = iPhone XS Max 512GB (~2 Weeks Charged)

I wrote some tests[1] which showed behaviour I did not expect.
IP_BIND_ADDRESS_NO_PORT seems to work as it should, but calling bind without it
enabled turns out to be even worse than I thought.
This is what I think is happening: A successful bind() on a socket without
IP_BIND_ADDRESS_NO_PORT enabled, with or without an explicit port configured,
makes the assigned (or supplied) port unavailable for new connect()s (on
different sockets), no matter the destination. I.e if you exhaust the entire
net.ipv4.ip_local_port_range with bind() (no matter what IP you bind to!),
connect() will stop working - no matter what IP you attempt to connect to. You
can work around this by manually doing a bind() (with or without an explicit
port, but without IP_BIND_ADDRESS_NO_PORT) on the socket before connect().

What blows my mind is that after running test2, you cannot connect to anything
without manually doing a bind() beforehand (as shown by test1 and test3 above)!
This also means that after running test2, software like ssh stops working:

When using IP_BIND_ADDRESS_NO_PORT, we don't have this problem (1 5 6 can be
run in any order):

Thank you for preparing that experiment. It's really valuable, and it
looks a lot like what I was seeing on the Snowflake bridge: calls to
connect would fail with EADDRNOTAVAIL unless first bound concretely to a
port number. IP_BIND_ADDRESS_NO_PORT causes bind not to set a concrete
port number, so in that respect it's the same as calling connect without
calling bind first.

It is surprising, isn't it? It certainly feels like calling connect
without first binding to an address should have the same effect as
manually binding to an address and then calling connect, especially if
the address you bind to is the same as the kernel would have chosen
automatically. It seems like it might be a bug, but I'm not qualified to
judge that.

If I am interpreting your results correctly, it means that either of the
two extremes is safe: either everything that needs to bind to a source
address should call bind with IP_BIND_ADDRESS_NO_PORT, or else
everything (whether it needs a specific source address or not) should
call bind *without* IP_BIND_ADDRESS_NO_PORT. (The latter situation is
what we've arrived at on the Snowflake bridge.) The middle ground, where
some connections use IP_BIND_ADDRESS_NO_PORT and some do not, is what
causes trouble, because connections that do not use
IP_BIND_ADDRESS_NO_PORT somehow "poison" the ephemeral port pool for
connections that do use IP_BIND_ADDRESS_NO_PORT (and for connections
that do not bind at all). It would explain why causing HAProxy not to
use IP_BIND_ADDRESS_NO_PORT resolved errors in my case.

> Removing the IP_BIND_ADDRESS_NO_PORT option from Haproxy and
> *doing nothing else* is sufficient to resolve the problem.

Maybe there are other processes on the same host which calls bind() without
IP_BIND_ADDRESS_NO_PORT, and blocks the ports? E.g OutboundBindAddress or
similar in torrc?

OutboundBindAddress is a likely culprit. We did end up setting
OutboundBindAddress on the bridge during the period of intense
performance debugging at the end of September.

One thing doesn't quite add up, though. The earliest EADDRNOTAVAIL log
messages started at 2022-09-28 10:57:26:

Whereas according to the change history of /etc on the bridge,
OutboundBindAddress was first set some time between 2022-09-29 21:38:37
and 2022-09-29 22:37:06, over 30 hours later. I would be tempted to say
this is a case of what you initially suspected, simple tuple exhaustion
between two static IP addresses, if not for the fact that pre-binding an
address resolved the problem in that case as well ("I get EADDRNOTAVAIL
sometimes even with netcat, making a connection to the haproxy port—but
not if I specify a source address in netcat"). But I only ran that
netcat test after OutboundBindAddress had been set, so there may have
been many factors being conflated.

Anyway, thank your for the insight. I apologize if I was inconsiderate
in my prior reply.

···

On Mon, Dec 12, 2022 at 12:39:50AM +0100, Anders Trier Olesen wrote:
_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

It is surprising, isn’t it? It certainly feels like calling connect
without first binding to an address should have the same effect as
manually binding to an address and then calling connect, especially if
the address you bind to is the same as the kernel would have chosen
automatically. It seems like it might be a bug, but I’m not qualified to

judge that.
Yes, I’m starting to think so too. And strange that Cloudflare doesn’t mention stumbling upon this problem in their blogpost on running out of ephemeral ports. [1]

If I find the time, I’ll make an attempt at understanding exactly what is going on in the kernel.

If I am interpreting your results correctly, it means that either of the
two extremes is safe
Yes. That is what I think too.

Anyway, thank your for the insight. I apologize if I was inconsiderate
in my prior reply.
Likewise!

Best regards

Anders Trier Olesen

[1] https://blog.cloudflare.com/how-to-stop-running-out-of-ephemeral-ports-and-start-to-love-long-lived-connections/

···

On Mon, Dec 12, 2022 at 4:16 PM David Fifield <david@bamsoftware.com> wrote:

On Mon, Dec 12, 2022 at 12:39:50AM +0100, Anders Trier Olesen wrote:

I wrote some tests[1] which showed behaviour I did not expect.
IP_BIND_ADDRESS_NO_PORT seems to work as it should, but calling bind without it
enabled turns out to be even worse than I thought.
This is what I think is happening: A successful bind() on a socket without
IP_BIND_ADDRESS_NO_PORT enabled, with or without an explicit port configured,
makes the assigned (or supplied) port unavailable for new connect()s (on
different sockets), no matter the destination. I.e if you exhaust the entire
net.ipv4.ip_local_port_range with bind() (no matter what IP you bind to!),
connect() will stop working - no matter what IP you attempt to connect to. You
can work around this by manually doing a bind() (with or without an explicit
port, but without IP_BIND_ADDRESS_NO_PORT) on the socket before connect().

What blows my mind is that after running test2, you cannot connect to anything
without manually doing a bind() beforehand (as shown by test1 and test3 above)!
This also means that after running test2, software like ssh stops working:

When using IP_BIND_ADDRESS_NO_PORT, we don’t have this problem (1 5 6 can be
run in any order):

Thank you for preparing that experiment. It’s really valuable, and it
looks a lot like what I was seeing on the Snowflake bridge: calls to
connect would fail with EADDRNOTAVAIL unless first bound concretely to a
port number. IP_BIND_ADDRESS_NO_PORT causes bind not to set a concrete
port number, so in that respect it’s the same as calling connect without
calling bind first.

It is surprising, isn’t it? It certainly feels like calling connect
without first binding to an address should have the same effect as
manually binding to an address and then calling connect, especially if
the address you bind to is the same as the kernel would have chosen
automatically. It seems like it might be a bug, but I’m not qualified to
judge that.

If I am interpreting your results correctly, it means that either of the
two extremes is safe: either everything that needs to bind to a source
address should call bind with IP_BIND_ADDRESS_NO_PORT, or else
everything (whether it needs a specific source address or not) should
call bind without IP_BIND_ADDRESS_NO_PORT. (The latter situation is
what we’ve arrived at on the Snowflake bridge.) The middle ground, where
some connections use IP_BIND_ADDRESS_NO_PORT and some do not, is what
causes trouble, because connections that do not use
IP_BIND_ADDRESS_NO_PORT somehow “poison” the ephemeral port pool for
connections that do use IP_BIND_ADDRESS_NO_PORT (and for connections
that do not bind at all). It would explain why causing HAProxy not to
use IP_BIND_ADDRESS_NO_PORT resolved errors in my case.

Removing the IP_BIND_ADDRESS_NO_PORT option from Haproxy and
doing nothing else is sufficient to resolve the problem.

Maybe there are other processes on the same host which calls bind() without
IP_BIND_ADDRESS_NO_PORT, and blocks the ports? E.g OutboundBindAddress or
similar in torrc?

OutboundBindAddress is a likely culprit. We did end up setting
OutboundBindAddress on the bridge during the period of intense
performance debugging at the end of September.

One thing doesn’t quite add up, though. The earliest EADDRNOTAVAIL log
messages started at 2022-09-28 10:57:26:
https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40198
Whereas according to the change history of /etc on the bridge,
OutboundBindAddress was first set some time between 2022-09-29 21:38:37
and 2022-09-29 22:37:06, over 30 hours later. I would be tempted to say
this is a case of what you initially suspected, simple tuple exhaustion
between two static IP addresses, if not for the fact that pre-binding an
address resolved the problem in that case as well (“I get EADDRNOTAVAIL
sometimes even with netcat, making a connection to the haproxy port—but
not if I specify a source address in netcat”). But I only ran that
netcat test after OutboundBindAddress had been set, so there may have
been many factors being conflated.

Anyway, thank your for the insight. I apologize if I was inconsiderate
in my prior reply.


tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

As I'm sure you've already gathered, your system is maxing out trying to
deal with all the connection requests. When inet_csk_get_port is called
and the port is found to be occupied then inet_csk_bind_conflict is
called to resolve the conflict. So in normal circumstances you shouldn't
see it in perf top much less at 79%. There are two ways to deal with it,
and each method should be complimented by the other. One way is to try
to increase the number of ports and reduce the wait time which you have
somehow tried. I would add the following:

I use on old Dual Intel Xeon E5-2680v2 CPU's, 256 GB RAM & the Tor IP's/traffic
routed over a dual 10G NIC. (40 exit relays)

net.ipv4.tcp_fin_timeout = 20

net.ipv4.tcp_fin_timeout = 4

net.ipv4.tcp_max_tw_buckets = 1200

net.ipv4.tcp_max_tw_buckets = 2000000

net.ipv4.tcp_keepalive_time = 1200

net.ipv4.tcp_keepalive_time = 60

net.ipv4.tcp_max_syn_backlog = 8192

net.core.netdev_max_backlog = 262144

···

On Freitag, 2. Dezember 2022 16:30:48 CET Chris wrote:

The complimentary method to the above is to lower the number of
connection requests by removing the frivolous connection requests out of
the equation using a few iptables rules.

I'm assuming the increased load you're experiencing is due to the
current DDos attacks and I'm not sure if you're using anything to
mitigate that but you should consider it.

You may find something useful at the following links

[1](GitHub - Enkidu-6/tor-ddos: iptables rules for Tor relay operators to mitigate ddos)

[2](GitHub - toralf/torutils: Few tools for a Tor relay.)

[background](https://gitlab.torproject.org/tpo/community/support/-/issues/40
093)

Cheers.

On 12/1/2022 3:35 PM, Christopher Sheats wrote:
> Hello tor-relays,
>
> We are using Ubuntu server currently for our exit relays.
> Occasionally, exit throughput will drop from ~4Gbps down to ~200Mbps
> and the only observable data point that we have is a significant
> increase in inet_csk_bind_conflict, as seen via 'perf top', where it
> will hit 85% [kernel] utilization.
>
> A while back we thought we solved with with two /etc/sysctl.conf settings:
> net.ipv4.ip_local_port_range = 1024 65535
> net.ipv4.tcp_tw_reuse = 1
>
> However we are still experiencing this problem.
>
> Both of our (currently, two) relay servers suffer from the same
> problem, at the same time. They are AMD Epyc 7402P bare-metal servers
> each with 96GB RAM, each has 20 exit relays on them. This issue
> persists after upgrading to 0.4.7.11.
>
> Screenshots of perf top are shared
> here: AS 396507: „We really need #help figuring out how to fix inet…“ - digitalcourage.social
>
> Does anyone have experience troubleshooting and/or fixing this problem?

--
╰_╯ Ciao Marco!

Debian GNU/Linux

It's free software and it gives you freedom!

I am happy to report that we have upgraded all our relays to Tor 0.4.8.0-alpha-dev and for the pst 8 days since the upgrade the bind conflict has ceased. No firewall rules are being used. No sysctl settings helped.

···


Christopher Sheats (yawnbox)
Executive Director
Emerald Onion
Signal: +1 206.739.3390
Website: https://emeraldonion.org/
Mastodon: AS 396507 (@EmeraldOnion@digitalcourage.social) - digitalcourage.social

On Dec 12, 2022, at 1:18 PM, Anders Trier Olesen anders.trier.olesen@gmail.com wrote:

It is surprising, isn’t it? It certainly feels like calling connect
without first binding to an address should have the same effect as
manually binding to an address and then calling connect, especially if
the address you bind to is the same as the kernel would have chosen
automatically. It seems like it might be a bug, but I’m not qualified to

judge that.
Yes, I’m starting to think so too. And strange that Cloudflare doesn’t mention stumbling upon this problem in their blogpost on running out of ephemeral ports. [1]

If I find the time, I’ll make an attempt at understanding exactly what is going on in the kernel.

If I am interpreting your results correctly, it means that either of the
two extremes is safe
Yes. That is what I think too.

Anyway, thank your for the insight. I apologize if I was inconsiderate
in my prior reply.
Likewise!

Best regards

Anders Trier Olesen

[1] https://blog.cloudflare.com/how-to-stop-running-out-of-ephemeral-ports-and-start-to-love-long-lived-connections/

On Mon, Dec 12, 2022 at 4:16 PM David Fifield <david@bamsoftware.com> wrote:

On Mon, Dec 12, 2022 at 12:39:50AM +0100, Anders Trier Olesen wrote:

I wrote some tests[1] which showed behaviour I did not expect.
IP_BIND_ADDRESS_NO_PORT seems to work as it should, but calling bind without it
enabled turns out to be even worse than I thought.
This is what I think is happening: A successful bind() on a socket without
IP_BIND_ADDRESS_NO_PORT enabled, with or without an explicit port configured,
makes the assigned (or supplied) port unavailable for new connect()s (on
different sockets), no matter the destination. I.e if you exhaust the entire
net.ipv4.ip_local_port_range with bind() (no matter what IP you bind to!),
connect() will stop working - no matter what IP you attempt to connect to. You
can work around this by manually doing a bind() (with or without an explicit
port, but without IP_BIND_ADDRESS_NO_PORT) on the socket before connect().

What blows my mind is that after running test2, you cannot connect to anything
without manually doing a bind() beforehand (as shown by test1 and test3 above)!
This also means that after running test2, software like ssh stops working:

When using IP_BIND_ADDRESS_NO_PORT, we don’t have this problem (1 5 6 can be
run in any order):

Thank you for preparing that experiment. It’s really valuable, and it
looks a lot like what I was seeing on the Snowflake bridge: calls to
connect would fail with EADDRNOTAVAIL unless first bound concretely to a
port number. IP_BIND_ADDRESS_NO_PORT causes bind not to set a concrete
port number, so in that respect it’s the same as calling connect without
calling bind first.

It is surprising, isn’t it? It certainly feels like calling connect
without first binding to an address should have the same effect as
manually binding to an address and then calling connect, especially if
the address you bind to is the same as the kernel would have chosen
automatically. It seems like it might be a bug, but I’m not qualified to
judge that.

If I am interpreting your results correctly, it means that either of the
two extremes is safe: either everything that needs to bind to a source
address should call bind with IP_BIND_ADDRESS_NO_PORT, or else
everything (whether it needs a specific source address or not) should
call bind without IP_BIND_ADDRESS_NO_PORT. (The latter situation is
what we’ve arrived at on the Snowflake bridge.) The middle ground, where
some connections use IP_BIND_ADDRESS_NO_PORT and some do not, is what
causes trouble, because connections that do not use
IP_BIND_ADDRESS_NO_PORT somehow “poison” the ephemeral port pool for
connections that do use IP_BIND_ADDRESS_NO_PORT (and for connections
that do not bind at all). It would explain why causing HAProxy not to
use IP_BIND_ADDRESS_NO_PORT resolved errors in my case.

Removing the IP_BIND_ADDRESS_NO_PORT option from Haproxy and
doing nothing else is sufficient to resolve the problem.

Maybe there are other processes on the same host which calls bind() without
IP_BIND_ADDRESS_NO_PORT, and blocks the ports? E.g OutboundBindAddress or
similar in torrc?

OutboundBindAddress is a likely culprit. We did end up setting
OutboundBindAddress on the bridge during the period of intense
performance debugging at the end of September.

One thing doesn’t quite add up, though. The earliest EADDRNOTAVAIL log
messages started at 2022-09-28 10:57:26:
https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40198
Whereas according to the change history of /etc on the bridge,
OutboundBindAddress was first set some time between 2022-09-29 21:38:37
and 2022-09-29 22:37:06, over 30 hours later. I would be tempted to say
this is a case of what you initially suspected, simple tuple exhaustion
between two static IP addresses, if not for the fact that pre-binding an
address resolved the problem in that case as well (“I get EADDRNOTAVAIL
sometimes even with netcat, making a connection to the haproxy port—but
not if I specify a source address in netcat”). But I only ran that
netcat test after OutboundBindAddress had been set, so there may have
been many factors being conflated.

Anyway, thank your for the insight. I apologize if I was inconsiderate
in my prior reply.


tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays


tor-relays mailing list
tor-relays@lists.torproject.org
tor-relays Info Page

> It is surprising, isn't it? It certainly feels like calling connect
> without first binding to an address should have the same effect as
> manually binding to an address and then calling connect, especially if
> the address you bind to is the same as the kernel would have chosen
> automatically. It seems like it might be a bug, but I'm not qualified to
> judge that.

Yes, I'm starting to think so too. And strange that Cloudflare doesn't mention
stumbling upon this problem in their blogpost on running out of ephemeral
ports. [1]
[1]How to stop running out of ephemeral ports and start to love long-lived connections
If I find the time, I'll make an attempt at understanding exactly what is going
on in the kernel.

Cloudflare has another blog post today that gets into this topic.

It investigates the difference in behavior between
inet_csk_bind_conflict and __inet_hash_connect that I commented on at
[tor-relays] inet_csk_bind_conflict - #13 by dcf and
Out of ephemeral ports on link between haproxy and extor-static-cookie (#40201) · Issues · The Tor Project / Anti-censorship / Pluggable Transports / Snowflake · GitLab.
Setting the IP_BIND_ADDRESS_NO_PORT option leads to __inet_hash_connect;
not setting it leads to inet_csk_bind_conflict.

The author attributes the difference in behavior to the fastreuse field
in the bind hash bucket:

···

On Mon, Dec 12, 2022 at 10:18:53PM +0100, Anders Trier Olesen wrote:

The bucket might already exist or we might have to create it first.
But once it exists, its fastreuse field is in one of three possible
states: -1, 0, or +1.

…inet_csk_get_port() skips conflict check for fastreuse == 1 buckets.
…__inet_hash_connect() skips buckets with fastreuse != -1.

_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

1 Like