Tor relay low traffic and keep disconnecting randomly

lpsdamaceno · June 8, 2022, 3:13pm

Hi all, i’m running two tor relays in Brazil: DDF778DF27F832737AC60AAE96207D01B38087D5 and 0BA445CFBABCF76F6D3118CE56B27DC47C8CA180 both are in the same IP address and i have a fiber link of 250 mbps. Each relay has 60 mbps (highest CPU capacity, quadcore cpu) and other, raspberry pi, 35 mbps as you can see in the metrics.

But the problems are:
1 - I’ve ran other relays before, and same happens i’ve dont find a way of figure out: the tor relays goes offline randomly and dont back until i “systemctl restart tor”. I can see in nyx that the daemon still running but relays aren’t traffegating any data. What this can be? It also shows as offline at tor metrics. I’ve already checked my connection and routers and everything seems to be quite stable.

2 - When its connected and working properly, the bandwidth measured is too low: you can see the limits that i’ve applied and the measured bandwidth. Doesnt’t match and i dont receive a lot of connections as I should. Why?

For both “problems”: i know that one of the relays is in the rampup period, but i had other relays in past that ran for more than 30 days in other addres but the same problem of the speed occours.

I’ve already open and verified all necessary ports. And i dont know why other relays, also in Brazil, using the same tech as me for connection runs very faster. Is there any setting that I should apply?

Superpaul209 · June 11, 2022, 12:38pm

Hello @lpsdamaceno,

The first thing that comes to mind, is your router capable of handling all connections? It may be unstable and reduce bandwidth or drop packets.

Do you have any error messages, or notices that tell you there has been a change ? In tor notices logs.

From my experience, when there is a change of IP address, you have to restart the service. Do you have a dynamic or fixed address on your router ?

Thanks

lpsdamaceno · June 14, 2022, 12:59am

Thanks for your reply!

I have a static ip address in my router. Its a plan from my ISP that i pay an extra for it… My router has 4 gigabit lan and i think it does well with the connections, i have no stability issues here.
From logs, nothing strange at all.
How Can I be sure of the router connections? Is there any sysctl parameter to tune it and see if is really a load problem? From the nyx side, it shows around 1500/ 2000 connections for both sides: incoming and outgoing. My internet connections keeps stable even in that number of connections… I also run an NTP server that is linked to pool ntp project, it receives 7000+ connections sometimes and everything keeps running well.

Superpaul209 · June 14, 2022, 10:13am

Thanks for the details.

Can you send your torrc config please ?
Which bandwidth do you want to set for your relay ?

chris · June 14, 2022, 2:31pm

Also, are you for certain that this 250mbps port has 250 upload and download?

lpsdamaceno · June 14, 2022, 2:59pm

My torrc doesn’t have any special setting. Just defaults ones, contact info, dir and or ports and my family setting.
The bandwidth and bandwidth burst are both 3.5MB/s (I want to have guard flag sometime and I guess that needs at least 3MB/s, so maybe 3.5 is enough). Should I send both torrc anyway?

I’ve changed few sysctl settings in my routers (OpenWrt) and in both relays (supposed to improve and it improves overall throughput).

Did you guys, recommend any specific setting at torrc?

One of relay’s hardware is 4GB / core2quad home server, other is Raspberry pi 3b running in arm64 system.

lpsdamaceno · June 14, 2022, 3:00pm

Yes. It has full 250Mbps link. Download and Upload is 250Mbps.

chris · June 14, 2022, 3:05pm

Wow, I can’t think of many other reasons your relay would be slow. Any logs from tor? Any router logs?

lpsdamaceno · June 14, 2022, 3:26pm

From main router i’m totally blind, isn’t possible to extract logs because it is an model that haven’t openwrt, the “logs” that i can see indicates that everything is normal and few dhcpv6 warnings, but nothing that should compromise the system function. I’ve disabled NAT boost here on this router side, because i think may kill connections earlier than it should maybe? But lets see if other setting on sysctl has effect on these problems.

I’ve used this on relay’s side:

vm.vfs_cache_pressure=600
vm.swappiness=100
vm.dirty_background_ratio=2
vm.dirty_ratio=60


# used on high bandwidth nodes (gbit interface)

# disabling forwarding first as this will
# reset some other values back to default (!)

net.ipv4.ip_forward = 0
net.ipv4.tcp_syncookies = 1
#net.ipv4.tcp_synack_retries = 2
#net.ipv4.tcp_syn_retries = 2
    
net.ipv4.conf.default.forwarding = 0 
net.ipv4.conf.default.proxy_arp = 0 
net.ipv4.conf.default.send_redirects = 1
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.all.send_redirects = 0

kernel.sysrq = 1
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.icmp_ignore_bogus_error_responses = 1

# optimizations
net.core.rmem_max = 33554432
net.core.wmem_max = 33554432
net.ipv4.tcp_rmem = 4096 87380 33554432 
net.ipv4.tcp_wmem = 4096 65536 33554432  
net.core.netdev_max_backlog = 131072
net.ipv4.tcp_no_metrics_save = 1
net.ipv4.tcp_moderate_rcvbuf = 1
#net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_max_orphans = 32768
net.ipv4.tcp_max_syn_backlog = 32768
net.ipv4.tcp_fin_timeout = 5 
vm.min_free_kbytes = 65536
#net.ipv4.netfilter.ip_conntrack_max = 196608
#net.netfilter.nf_conntrack_tcp_timeout_established = 7200
#net.netfilter.nf_conntrack_checksum = 0
#net.netfilter.nf_conntrack_max = 196608 
#net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 15
net.nf_conntrack_max = 262144
net.ipv4.tcp_keepalive_time = 120
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_probes = 10
net.ipv4.ip_local_port_range = 1025 65530
net.core.somaxconn = 131072
net.ipv4.tcp_max_tw_buckets = 2000000
net.ipv4.tcp_timestamps = 1

net.ipv4.conf.all.forwarding=1

net.ipv4.tcp_congestion_control=westwood
net.ipv4.tcp_low_latency=1





#Prevent SYN attack, enable SYNcookies (they will kick-in when the max_syn_backlog reached)
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_syn_retries = 2
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_max_syn_backlog = 4096

# Enable IP spoofing protection, turn on source route verification
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.rp_filter = 1

# Disable ICMP Redirect Acceptance
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.default.secure_redirects = 0

# Decrease the time default value for tcp_fin_timeout connection
net.ipv4.tcp_fin_timeout = 5

# Decrease the time default value for connections to keep alive
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl = 8

# Turn on the tcp_timestamps, accurate timestamp make TCP congestion control algorithms work better
net.ipv4.tcp_timestamps = 1

# Don't ignore directed pings
net.ipv4.icmp_echo_ignore_all = 0

# Enable ignoring broadcasts request
net.ipv4.icmp_echo_ignore_broadcasts = 1

# Enable bad error message Protection
net.ipv4.icmp_ignore_bogus_error_responses = 1

# Enable a fix for RFC1337 - time-wait assassination hazards in TCP
net.ipv4.tcp_rfc1337 = 1

# Do not auto-configure IPv6



###
### TUNING NETWORK PERFORMANCE ###
###

# Use BBR TCP congestion control and set tcp_notsent_lowat to 16384 to ensure HTTP/2 prioritization works optimally
# Do a 'modprobe tcp_bbr' first (kernel > 4.9)
# Fall-back to htcp if bbr is unavailable (older kernels)
net.ipv4.tcp_congestion_control = htcp
net.ipv4.tcp_congestion_control = bbr
net.ipv4.tcp_notsent_lowat = 16384
    
# For servers with tcp-heavy workloads, enable 'fq' queue management scheduler (kernel > 3.12)
net.core.default_qdisc = fq

# Turn on the tcp_window_scaling
net.ipv4.tcp_window_scaling = 1

# Increase the read-buffer space allocatable
net.ipv4.tcp_rmem = 8192 87380 16777216
net.ipv4.udp_rmem_min = 32768
net.core.rmem_default = 262144
net.core.rmem_max = 16777216

# Increase the write-buffer-space allocatable
net.ipv4.tcp_wmem = 8192 65536 16777216
net.ipv4.udp_wmem_min = 32768
net.core.wmem_default = 262144
net.core.wmem_max = 16777216

# Increase number of incoming connections
net.core.somaxconn = 32768

# Increase number of incoming connections backlog
net.core.netdev_max_backlog = 16384
net.core.dev_weight = 64

# Increase the maximum amount of option memory buffers
net.core.optmem_max = 65535

# Increase the tcp-time-wait buckets pool size to prevent simple DOS attacks
net.ipv4.tcp_max_tw_buckets = 1440000

# try to reuse time-wait connections, but don't recycle them (recycle can break clients behind NAT)
#net.ipv4.tcp_tw_recycle = 0
net.ipv4.tcp_tw_reuse = 1

# Limit number of orphans, each orphan can eat up to 16M (max wmem) of unswappable memory
net.ipv4.tcp_max_orphans = 16384
net.ipv4.tcp_orphan_retries = 0

# don't cache ssthresh from previous connection
net.ipv4.tcp_no_metrics_save = 1
net.ipv4.tcp_moderate_rcvbuf = 1

# Increase size of RPC datagram queue length
net.unix.max_dgram_qlen = 50

# Don't allow the arp table to become bigger than this
net.ipv4.neigh.default.gc_thresh3 = 2048

# Tell the gc when to become aggressive with arp table cleaning.
# Adjust this based on size of the LAN. 1024 is suitable for most /24 networks
net.ipv4.neigh.default.gc_thresh2 = 1024

# Adjust where the gc will leave arp table alone - set to 32.
net.ipv4.neigh.default.gc_thresh1 = 32

# Adjust to arp table gc to clean-up more often
net.ipv4.neigh.default.gc_interval = 30

# Increase TCP queue length
net.ipv4.neigh.default.proxy_qlen = 96
net.ipv4.neigh.default.unres_qlen = 6

# Enable Explicit Congestion Notification (RFC 3168), disable it if it doesn't work for you
net.ipv4.tcp_ecn = 1
net.ipv4.tcp_reordering = 3

# How many times to retry killing an alive TCP connection
net.ipv4.tcp_retries2 = 15
net.ipv4.tcp_retries1 = 3

# Avoid falling back to slow start after a connection goes idle
# keeps our cwnd large with the keep alive connections (kernel > 3.6)
net.ipv4.tcp_slow_start_after_idle = 0

# Allow the TCP fastopen flag to be used, beware some firewalls do not like TFO! (kernel > 3.7)
net.ipv4.tcp_fastopen = 3

# This will enusre that immediatly subsequent connections use the new values
net.ipv4.route.flush = 1

net.netfilter.nf_conntrack_udp_timeout=15
net.netfilter.nf_conntrack_udp_timeout_stream=30
net.ipv6.conf.all.disable_ipv6 = 0
net.ipv6.conf.default.disable_ipv6 = 0
net.ipv6.conf.lo.disable_ipv6 = 0

And this on openwrt router’s side:

net.ipv4.tcp_congestion_control=bbr
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.tcp_no_metrics_save = 1
net.core.netdev_max_backlog = 15000
net.ipv4.tcp_keepalive_intvl = 15
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_fin_timeout = 5 
net.netfilter.nf_conntrack_tcp_timeout_established = 120
net.netfilter.nf_conntrack_udp_timeout = 15
net.netfilter.nf_conntrack_udp_timeout_stream = 60
net.ipv4.conf.default.arp_ignore = 1
net.ipv4.conf.all.arp_ignore = 1
net.ipv4.ip_forward = 1
net.ipv4.tcp_timestamps = 0

Luiz

Superpaul209 · June 14, 2022, 3:58pm

You could try to reset all network settings by default on your relays, touching the settings can affect the behavior of Tor, I am not a developer but the default settings of the system work without problems and there is no reason to change them.

And about the raspberry pi, a middle relay will never work properly because the CPU is not powerful enough to handle all the connections.
You can read about it on my old post : Running a bridge on a Raspberry Pi 2, still worth it? - #2 by Superpaul209
Since you are running a middle relay on the same IP as your raspberry pi, it won’t help the network to host a bridge. So I recommend you to set a limit for the bandwidth on this relay (you shouldn’t go over 3 MB/s or even 2 MB/s).

I hope it will helps you

lpsdamaceno · June 15, 2022, 8:26am

I was using all the defaults when i’ve started this post. Before, i’ve scheduled a systemctl restart tor every 00:00 UTC with cron. Since yesterday, with the settings i mentioned before , things seems to be quite stable. Both relays didn’t disconnected and everything seems to still working.
Maybe the tcp tunnings take some effect. I’ve disabled the cron schedule to restart tor and i will monitor next days and in a week i update this topic again if everything occours good. If not, i will uptdate soon as the error occours.

As you can see my both relays:
PlasmaMiddle and MyArmyOfLovers are up for the same time because both restarted around 00:00 UTC. Lets see if bandwith measured increases and if so, i will put more bandiwth in it.

Both are handling around 1k connections with low cpu and memory usage right now.

PlasmaMiddle is the raspberry pi 3b running in arm64 mode and MyArmyOfLovers is the core2quad server.

lpsdamaceno · June 27, 2022, 3:09pm

Update:

Plasma Middle is begin offline due project destination changing on RPi. MyArmyOfLovers is beign online, still having some troubles with traffic, but i dont see any memory problem or even connection issue under my router.
I’m planning to put a server with Gigabit connection to work as my main router instead the TPLink Archer C80 v1 (that router should be enogh for the tor connections, dont?).