Traffic analysis determination of final address

Nameless · December 19, 2021, 6:35pm

Hi.
I have learnt that ISP companies can tell which websites you visit even when the traffic is encrypted because all sites have a unique packet request pattern, which makes VPN based privacy pretty useless. What I wanted to know is if you combined a VPN with Tor and browsed clearnet would your ISP still be able to figure out which site your on? And more importantly, could they figure out a hidden service you use? I know Tor uses cell padding which helps obscure things but the size of that padding is public knowledge so they would just subtract it from the collected data. Would downloading large empty files whilst browsing help to obscure things? Lots of the help on the github wiki page redirects to a new page so its confusing to match things up. Thanks.

lokodlare · December 19, 2021, 8:43pm

Your ISP can see which sites you are visiting, even if you are using an encrypted https-connection, mostly because of your DNS requests. They could in theory see that you visited your bank’s website but not the content of what you did there.

Tor (or any decent VPN) will it make very unlikely for your ISP to figure out which sites you are visiting. They only see the connection and data flowing to your entry node / bridge (Tor) or your VPN provider’s server. That however does not automatically mean that you are not traceable online. There are a number of ways for different actors to find patterns on who is visiting what and to create usage patterns. Hint: Most of them come from your browser and information it shares automatically. As this is a complex topic, I won’t get into this right now.

If you are just scared of your ISP (for whatever reason), a VPN or Tor will help.

Nameless · December 20, 2021, 6:53pm

That is what I thought and expected, but see here.

“Using a VPN or SSH does not provide strong guarantees of hiding your the fact you are using Tor from your ISP. VPN’s and SSH’s are vulnerable to an attack called Website traffic fingerprinting ^1^. Very briefly, it’s a passive eavesdropping attack, although the adversary only watches encrypted traffic from the VPN or SSH, the adversary can still guess what website is being visited, because all websites have specific traffic patterns. The content of the transmission is still hidden, but to which website one connects to isn’t secret anymore. There are multiple research papers on that topic. ^2^ Once the premise is accepted, that VPN’s and SSH’s can leak which website one is visiting with a high accuracy, it’s not difficult to imagine, that also encrypted Tor traffic hidden by a VPN’s or SSH’s could be classified. There are no research papers on that topic.”

https://2019.www.torproject.org/projects/torbrowser/design/

" 1. Local Network/ISP/Upstream Router The adversary can also inject malicious content at the user’s upstream router when they have Tor disabled, in an attempt to correlate their Tor and Non-Tor activity.Additionally, at this position the adversary can block Tor, or attempt to recognize the traffic patterns of specific web pages at the entrance to the Tor network."

That last bit of information is one of many good reasons to be afraid of your ISP.

lokodlare · December 20, 2021, 7:46pm

I recommend to read the sources - I just selected the first one from the Tor Project:

Website traffic fingerprinting Website traffic fingerprinting is an attempt by the adversary to recognize the encrypted traffic patterns of specific websites. In the case of Tor, this attack would take place between the user and the Guard node, or at the Guard node itself.The most comprehensive study of the statistical properties of this attack against Tor was done by Panchenko et al. Unfortunately, the publication bias in academia has encouraged the production of a number of follow-on attack papers claiming “improved” success rates, in some cases even claiming to completely invalidate any attempt at defense. These “improvements” are actually enabled primarily by taking a number of shortcuts (such as classifying only very small numbers of web pages, neglecting to publish ROC curves or at least false positive rates, and/or omitting the effects of dataset size on their results). Despite these subsequent “improvements”, we are skeptical of the efficacy of this attack in a real world scenario, especially in the face of any defenses.In general, with machine learning, as you increase the number and/or complexity of categories to classify while maintaining a limit on reliable feature information you can extract, you eventually run out of descriptive feature information, and either true positive accuracy goes down or the false positive rate goes up. This error is called the bias in your hypothesis space. In fact, even for unbiased hypothesis spaces, the number of training examples required to achieve a reasonable error bound is a function of the complexity of the categories you need to classify.In the case of this attack, the key factors that increase the classification complexity (and thus hinder a real world adversary who attempts this attack) are large numbers of dynamically generated pages, partially cached content, and also the non-web activity of the entire Tor network. This yields an effective number of “web pages” many orders of magnitude larger than even Panchenko’s “Open World” scenario, which suffered continuous near-constant decline in the true positive rate as the “Open World” size grew (see figure 4). This large level of classification complexity is further confounded by a noisy and low resolution featureset - one which is also relatively easy for the defender to manipulate at low cost.To make matters worse for a real-world adversary, the ocean of Tor Internet activity (at least, when compared to a lab setting) makes it a certainty that an adversary attempting examine large amounts of Tor traffic will ultimately be overwhelmed by false positives (even after making heavy tradeoffs on the ROC curve to minimize false positives to below 0.01%). This problem is known in the IDS literature as the Base Rate Fallacy, and it is the primary reason that anomaly and activity classification-based IDS and antivirus systems have failed to materialize in the marketplace (despite early success in academic literature).Still, we do not believe that these issues are enough to dismiss the attack outright. But we do believe these factors make it both worthwhile and effective to deploy light-weight defenses that reduce the accuracy of this attack by further contributing noise to hinder successful feature extraction.

=> Yes, it is possible in theory. You’d however need considerable time and resources to determine to a reliable degree if user A has visited the specific website B in the real world. This does not mean it’s impossible - just improbable for almost any user. From what I understand, even if you are Edward Snowden such an attack is a stretch. If you are just the average Joe, I think you overestimate what your ISP is allowed to do, can technically do and/or even wants to do.

I have heard that some ISPs for example in the US are playing with the thought of selling their users browsing patterns to advertisers as an additional revenue stream. As these ISPs are still a business and want to make a(dditional) profit, they will use the data they can get easily rather than spend an extraordinary amount of effort and money to get from 80% data coverage to 95%, 99% or 100%.

Nameless · December 20, 2021, 8:26pm

The main element of my question was whether adding Tor to a VPN would stop this method from working or at least make it harder and secondly could this method be used to determine which .onion services a person uses. The US is known to sell data for marketing but right now in Europe many ISP companies have installed a “black box” which specifically records only VPN traffic and Tor traffic which specifically and privately goes to intelligence agencies, someone on this forum posted a link about how the UK Home Office and national crime agency now have black box access, its a real thing thats happening. Also I may be wrong but I think Snowden is now just at the level of your average guy, all the powers and abilities came from NSA software and networks which were being misused for unwarranted mass surveillance, he was one of an entire agency so I can’t imagine there is anything particularly special about him besides his correct moral compass and sense of bravery along with wilfulness to accept punishment. Saying that, the last thing I saw from him was about baking cakes and it was posted to Twitter. Its funny to know the CIA have an everlasting copy of cake based information

lokodlare · December 20, 2021, 9:13pm

As far as I can see - no, not really. If you are the one in one billion to get that kind of attention, I doubt adding a VPN has any meaningful effect.

Doubt. Even if true, a good amount of the people working from home due to the ongoing global pandemic will have a device that upon boot connects to some sort of company VPN. We are talking tens of millions of users every single day. This is extremely common technology which is very hard to break. That’s why it is popular.

Nameless · December 20, 2021, 9:21pm

No doubt needed, its confirmed.

ukmr · December 20, 2021, 10:41pm

The Internet is rife with the carousel of revolving questions & speculation regarding Tor & ruminating on the pros & cons of the benefits, the for & against, of using Tor with & without a VPN & this subject is very well lamented across virtually every social network.

But as @lokodlare has quite reasonably pointed out, it’s beneficial to focus on the facts available to us, including the helpful information available on the Tor project website, in tandem with links provided to further elucidate on the technical specification of various elements of the Tor network, including onion networking.

Each person must decide for themselves any perceived or actual threat model. Yes, one could rightfully state “The UK has the most intrusive mass surveillance regime of any democratic country, and the security services are able to spy on everyone whether or not we’re suspected of criminal activity.” [1] However, once again, each person must decide for themselves how this informs their use of the Internet.

In conclusion & I speak only for myself when I say, we have the opportunity to make the Tor Forums a hybrid among online spaces where, yes, one can speculate if they choose to but we can also work with the facts & support one another in our endeavours to help the Tor Project & evolve with it, as a network still very much in development on an ever-increasingly surveilled & militarised Internet. Sadly this is the landscape before us & we must remain resolute in our efforts & not allow our adversaries to cause any undue panic.

[1] SNOOPERS' CHARTER - Liberty

Nameless · December 21, 2021, 11:39am

I understand that and your input is massively appreciated. Have you personally ever had any issues during your involvement with Tor? The network is indeed still under development but sadly things like the lack of concern surrounding KAX17 and the willingness to leave anyone on mobile with no option other than outdated vulnerable software makes me feel like the adversities are already winning from the inside out. Using VPN with Tor would help to lessen suspicion from the ISP, VPN is associated with ‘use discount code clonknob for 70% off, Watch iPlayer abroad, watch Netflix from other countries’ whereas lets face it the only time you hear of Tor is for bad things. I read a while back that if you use bare Tor so your ISP sees it and somebody falsely reports you for online criminality then the Tor traffic could be enough to get a warrant, if you use bare Tor and break up with your partner you might get raided under the guise of being a dark web drug lord, all because you use Tor to browse wikileaks.