Snowflake proxy connections limit

Hello.
Recently I decided to install Snowflake proxy to better understand how it works.
After looking at logs, I began to suspect that its work kind of single threaded and it is not using all available resources:
What I see is lots of “sdp offer successfully received” messages, followed by “Timed out waiting for client to open data channel” message.
The problem is that proxy make no new connections until client successfully connects or times out.
Looks like timeout duration is 25s, here is histogram from my logs:
snowflake2
Which means that it is not possible to have more than 144 connections per hour. Is this limit made intentionally? If yes, then why? Why not processing several sdp offers simultaneously?
From logs I see that proxy is almost always reaching this cap:
snowflake1
(line of 12 connections per 5 minutes is clearly seen)
(yellow is sdp offers, blue is connection successes)


I saw explanation about timeouts in different topic:

But it looks like @meskio is wrong: SDP offers are coming from clients, it is not just timeout without clients, it’s a failures.

As per @arma here:

It looks more like different frequency of (single-threaded?) polling of the same broker, than using several brokers.

That is a better way of putting it :slight_smile:

It looks more like different frequency of (single-threaded?) polling of the same broker, than using several brokers.

Right – there is one Snowflake broker, which clients reach via domain fronting and which Snowflakes reach directly.

Different frequency of polling is half of the story. The other half is that the Snowflakes running in browser extensions have a limit of one client they can serve at once, whereas the headless Snowflake have a configurable limit of how many clients to serve in parallel. See the “-capacity” argument to the ./proxy command:

which looks like it defaults to 0:

which I believe means “no limit”.

Load balancing is handled at the broker end: snowflakes say how many clients they’re handling right now when they check in with the broker, so the broker has the opportunity to assign clients to less-loaded snowflakes. I think the load balancing approach is very simple currently.

And as a last note, you can read a big pile of wishlist items on the gitlab tickets:

2 Likes

Thanks for clarifications.
1 client at a time is a very strong restriction.
I think it is better to use standalone version when possible.

Yes, I know about it.

What I see with my proxy: 1. Requests are coming almost non-stop. 2. Most of them are failing with timeout.
It means that network want to put load on my proxy, but proxy don’t want to accept such load, waiting in timeouts most of the time.
Failing connections may be the problem by itself, but what prevents proxy from handling timeout in it’s own thread and keep asking for new connections at the same time? It looks like too many resources are wasted while being very needed at the same time.

1 Like

I’m guessing that you’re running a standalone proxy. And your NAT is unrestricted, right? The stats look interesting, because I too was under the impression that there is always enough proxies, but now looking at https://snowflake-broker.torproject.net/metrics (see the meanings of the stats), to be precise at client-restricted-denied-count, and it seems huge to me. It’s in order of the matched count. While snowflake-idle-count is also very big. Maybe what’s happening is there is enough to serve unrestricted clients, but not quite enough to serve all the restricted ones?

I believe the number 25s comes from here. It polls the broker. If it gets a client, it tries to connect to it for 20 seconds. If it fails, then it polls the broker again. And if so, I agree, it seems wrong. I might be wrong as I’m only familiar with the extension’s codebase.

I don’t think that the reason is that it doesn’t want to. The “Timed out waiting for client to open data channel” message is displayed when you and the client have exchanged the offer/answer, but have failed to establish a WebRTC connection for some reason.

2 Likes

Yes, standalone. I have no NAT (I believe).

I mean logic is single-threaded and that’s why instead of accepting connections, proxy most of the time just waits. I understand that timeouts / failed attempts can happen, that is not a problem by its own.


However, after so many months of testing, I doubt that I want more connections.
Looks like Snowflake have huge memory leak and I don’t have enough RAM to operate proxy with large amount of connections in such conditions.
Same ~50-100 users at the start of proxy consume ~100MB of RAM, but after several days of uptime, RAM usage grows above 1 GB.

I haven’t looked at CPU or memory on my machine in quite a while. I checked yesterday and RAM was at 1.3GB, a few hours later it was at 1.59, and this morning it was back down to 1.29. Not sure it’s a leak but it sure is greedy. My current run of snowflake started on Dec. 1 and has been running uninterrupted since then so I presume the RAM usage must have leveled off at some point and doesn’t get any worse. I’m using an old Mac Pro x86-64, 2x6 core and 32 GB RAM so I didn’t notice any performance impact but it would be pretty brutal on less RAM for sure. Your post made me have a look so I thought I’d report in.

2 Likes

It is possible that Snowflake have buffers, which can grow during high activity of clients, but for some reason they can’t shrink back when activity becomes low again.
I see correlation of RAM usage with client count, so it may be that such leak is not completely unmanaged.
But at the same time it is very strange that after start proxy requires lot less RAM per user than after weeks of operation.

If anyone can collect data and make charts with amount of TCP connections and RAM usage of Snowflake proxy over the time (including data after proxy restart) it may give clues about what is happening.

I think it’s getting off-topic. I created an issue about the leak.

I just wanted to note that if problem with connections limits will be solved, then right after it problem with RAM usage will appear.
So, maybe, RAM consumption problem should be fixed before single-threaded bottleneck.