Tor relay crashes since upgrading to 0.4.7.10

Hi!

Since I upgraded to Tor 0.4.7.10 some (different ones each time) of the Tor relays crash regularly. I’m not sure whether it’s because of a change in 0.4.7.10 or whether it’s just a coincidence that it is happening after I upgraded to the new version. It might as well be a mistake from my side or faulty hardware.

My operating system is FreeBSD 13.1 with Tor 0.4.7.10 and OpenSSL 1.1.1o. At first I thought about faulty memory, but the RAM is pretty much brand new and finished a complete run of MemTest86 before I put the server in the DC. So that should be unlikely, but not impossible.

Note that these error messages come from different relays. It’s rarely one relay that crashes, mostly two at a time. When it happens, the others keep going as if nothing happened.

I’m in doubt whether the 0x0000000 is sensitive information or not, so I replaced it with zeros just in case. I can edit it later of need be.

[err] tor_assertion_failed_: Bug: src/feature/nodelist/routerlist.c:3247: routerlist_assert_ok: Assertion tor_memeq(sd->extra_info_digest, d, 20) failed; aborting. (on Tor 0.4.7.10 )
[err] Bug: Tor 0.4.7.10: Assertion tor_memeq(sd->extra_info_digest, d, 20) failed in routerlist_assert_ok at src/feature/nodelist/routerlist.c:3247: . Stack trace: (on Tor 0.4.7.10 )
[err] Bug:     0x0000000 <log_backtrace_impl+0x5c> at /usr/local/bin/tor (on Tor 0.4.7.10 )
[err] Bug:     0x0000000 <tor_assertion_failed_+0x142> at /usr/local/bin/tor (on Tor 0.4.7.10 )
[err] Bug:     0x0000000 <routerlist_assert_ok+0x4f6> at /usr/local/bin/tor (on Tor 0.4.7.10 )
[err] Bug:     0x0000000 <router_load_routers_from_string+0x3d9> at /usr/local/bin/tor (on Tor 0.4.7.10 )
[err] Bug:     0x0000000 <connection_dir_reached_eof+0x1a82> at /usr/local/bin/tor (on Tor 0.4.7.10 )
[err] Bug:     0x0000000 <connection_handle_read+0xbfd> at /usr/local/bin/tor (on Tor 0.4.7.10 )
[err] Bug:     0x0000000 <connection_add_impl+0x239> at /usr/local/bin/tor (on Tor 0.4.7.10 )
[err] Bug:     0x000000000 <event_base_assert_ok_nolock_+0xbfd> at /usr/local/lib/libevent-2.1.so.7 (on Tor 0.4.7.10 )
[err] Bug:     0x000000000 <event_base_loop+0x58c> at /usr/local/lib/libevent-2.1.so.7 (on Tor 0.4.7.10 )
[err] Bug:     0x0000000 <do_main_loop+0x12a> at /usr/local/bin/tor (on Tor 0.4.7.10 )
[err] Bug:     0x0000000 <tor_run_main+0x12c> at /usr/local/bin/tor (on Tor 0.4.7.10 )
[err] Bug:     0x0000000 <tor_main+0x61> at /usr/local/bin/tor (on Tor 0.4.7.10 )
[err] tor_assertion_failed_: Bug: ./src/core/or/circuitmux_ewma.h:128: TO_EWMA_POL_CIRC_DATA: Assertion pol->magic == EWMA_POL_CIRC_DATA_MAGIC failed; aborting. (on Tor 0.4.7.10 )
[err] Bug: Tor 0.4.7.10: Assertion pol->magic == EWMA_POL_CIRC_DATA_MAGIC failed in TO_EWMA_POL_CIRC_DATA at ./src/core/or/circuitmux_ewma.h:128: Mismatch: 0 != 1981708103. Stack trace: (on T>
[err] Bug:     0x0000000 <log_backtrace_impl+0x5c> at /usr/local/bin/tor (on Tor 0.4.7.10 )
[err] Bug:     0x0000000 <tor_assertion_failed_+0x142> at /usr/local/bin/tor (on Tor 0.4.7.10 )
[err] Bug:     0x0000000 <ocirc_cevent_publish+0x5a6> at /usr/local/bin/tor (on Tor 0.4.7.10 )
[err] Bug:     0x0000000 <circuitmux_detach_circuit+0x1e8> at /usr/local/bin/tor (on Tor 0.4.7.10 )
[err] Bug:     0x0000000 <circuit_set_p_circid_chan+0x172> at /usr/local/bin/tor (on Tor 0.4.7.10 )
[err] Bug:     0x0000000 <circuit_set_p_circid_chan+0x40> at /usr/local/bin/tor (on Tor 0.4.7.10 )
[err] Bug:     0x0000000 <command_process_cell+0x386> at /usr/local/bin/tor (on Tor 0.4.7.10 )
[err] Bug:     0x0000000 <connection_or_process_inbuf+0x1f2> at /usr/local/bin/tor (on Tor 0.4.7.10 )
[err] Bug:     0x0000000 <connection_handle_read+0x8a1> at /usr/local/bin/tor (on Tor 0.4.7.10 )
[err] Bug:     0x0000000 <connection_add_impl+0x239> at /usr/local/bin/tor (on Tor 0.4.7.10 )
[err] Bug:     0x000000000 <event_base_assert_ok_nolock_+0xbfd> at /usr/local/lib/libevent-2.1.so.7 (on Tor 0.4.7.10 )
[err] Bug:     0x000000000 <event_base_loop+0x58c> at /usr/local/lib/libevent-2.1.so.7 (on Tor 0.4.7.10 )
[err] Bug:     0x0000000 <do_main_loop+0x12a> at /usr/local/bin/tor (on Tor 0.4.7.10 )
[err] Bug:     0x0000000 <tor_run_main+0x12c> at /usr/local/bin/tor (on Tor 0.4.7.10 )
[err] Bug:     0x0000000 <tor_main+0x61> at /usr/local/bin/tor (on Tor 0.4.7.10 )
INTERNAL ERROR: Raw assertion failed in Tor 0.4.7.10 at src/feature/hs/hs_circuitmap.c:79: b == (((head->hth_table[b])->hs_circuitmap_node.hte_hash) % head->hth_table_length)
0x0000000 <dump_stack_symbols_to_error_fds+0x56> at /usr/local/bin/tor
0x0000000 <tor_raw_assertion_failed_msg_+0x1a9> at /usr/local/bin/tor
0x0000000 <hs_circuitmap_get_all_intro_circ_relay_side+0x156> at /usr/local/bin/tor
0x0000000 <hs_dos_consensus_has_changed+0xe4> at /usr/local/bin/tor
0x0000000 <networkstatus_set_current_consensus+0xa99> at /usr/local/bin/tor
0x0000000 <connection_dir_reached_eof+0x1b5a> at /usr/local/bin/tor
0x0000000 <connection_handle_read+0xbfd> at /usr/local/bin/tor
0x0000000 <connection_add_impl+0x239> at /usr/local/bin/tor
0x000000000 <event_base_assert_ok_nolock_+0xbfd> at /usr/local/lib/libevent-2.1.so.7
0x000000000 <event_base_loop+0x58c> at /usr/local/lib/libevent-2.1.so.7
0x0000000 <do_main_loop+0x12a> at /usr/local/bin/tor
0x0000000 <tor_run_main+0x12c> at /usr/local/bin/tor
0x0000000 <tor_main+0x61> at /usr/local/bin/tor

Anyone experienced this before? Any way to troubleshoot this further? Thanks in advance for helping me :slight_smile: .

My thoughts are similar: most likely, problem is in faulty hardware and most likely in faulty RAM.
Crashes looks random to me, but I’m not sure if I am right about it.
Is it possible to remotely run MemTest86? Or, maybe, to find RAM tests, which don’t require restarting computer. However, I trust only MemTest86.

MemTest85 might also run on the running OS, but I guess (if it even runs on it as a program) it will not be able to test allocated/active memory? So that will be of limited use I think? I sadly cannot remotely run MemTest86 from boot (which I did before I moved the server to the DC). But anyway, the next step will be testing the RAM one way or another…

For now (until I can get to the DC) I added a cronjob that checks whether the relays are running, and if not restarts them.

Thanks for you reply :slight_smile: . If other people have similar or other thoughts, let me know!

You can unload as many programs as you can, run test and then load programs again.
There is a chance that problematic memory will be used by OS at the time you run test or that test will just be worse than MemTest86, so if it finds a problem, then it is most likely there, if it not finds problem, it may be there too, but with much lower chance.

This may seem like dumb suggestion, but id advise compiling the 1.1.1q version of openssl.

The last few times I’ve build tor from source, the obj,i,k and p versions gave me a bunch of shit and threw a bunch of weird errors. Here’s the OpenSSL release notes /news/openssl-1.1.1-notes.html
You might try compiling 1.1.1q from source if isn’t available from FreeBSD repos.

This may not even be the issue, but it may might help.

Do you know what caused the issues with non-q version of OpenSSL? As far as I know, it should work with most of not all of the mentioned versions right?

To be completely honest, im not entirely sure why it works with some and not others. Ive compiled on macOS Monterey, I and q worked, but j and k didn’t.
I compiled on Ubuntu, it was mostly ok, but I had to get the devel version of the latest 3.0.5 of openssl when compiling 0.4.7.8.
I compiled on FreeBSD 13.1 and tor 4.7.10 and it took a bit, and wouldn’t compile at all with GCC or G++ compiler, I had to remove GCC and install LLVM/Clang14 on top of 1.1.1q and level 3.0.5 of openssl to get it compiled and working correctly.

Considering the error codes and make exit codes I have been seeing, im fairly sure its a crypto library issue, as sometimes it’ll try to compile with binaries but wouldn’t use the C headers for some reason. Though this issue was exclusive to macOS Monterey, no issue with FreeBSD or ubuntu with C headers.

It may very well be a combo of openSSL crypto and the GCC compiler being a bit dated by now. But the issue seemed to resolve itself when I stopped using GNU C compiler and set the compiler flag defaults to use Clang/LLVM over GCC instead. Partially.

I was having a VERY similar issue with compiling Nmap and Zenmap from source code. Very very similar crypto library issue with openssl version 1.1.1 and its variations. Again, compiled just fine with Clang once I had removed the (default) binaries of the prebaked openssl and compiled 3.0.5 and 1.1.1q from source. Its been very hit and miss tracking the issue, as for some unknown reason, there were times when it would just… compile anyway despite the errors and spit out executables that would run. Very odd.

Then attempting another compile without any modifications save using the make distclean ; make -j8 command to wipe compiled code and leave just the make files spit out by ./configure but this time, it would just straight up fail the C make and exit with no binaries.

This may or may not be related, but when using GCC the default HOST= and BUILD= for compiling openssl using the ./config command to let the compiler pick the arch and platform would sometimes produce binaries and headers that wouldn’t work, but wouldn’t throw error codes, but when compiling from Clang it would produce this.
So instead of taking the risk, I used the ./Configure HOST=x86_64-FreeBSD-13.1
This will differ from OS and arch, but if you just run ./Configure it’ll print a help screen with a list of base Archs and Platforms to pick from, you can also pick a Compiler instead of an OS and arch which is nice.

But specifying host and build arch did help I think.