How to make JSoup access the web via Tor

First of all, I’d just like to emphasize – I am new to Tor and I have only very basic networking knowledge.

I am currently working on a simple web scraper using Jsoup, my goal with this scraper was for it to be anonymous so that the server you are accessing knows nothing about you.

I originally wanted to use a proxy, although I had no idea how to even get started with this, and most proxies seem to cost money and were closed source.

Then, I came to the conclusion that I could make the request via a Tor Onion Proxy.

After that, I searched online to try and find Java bindings for Tor, and I found a couple such as: GitHub - thaliproject/Tor_Onion_Proxy_Library: Provides a JAR and an AAR for embedding the Tor Onion Proxy into a Java or Android Program, GitHub - PanagiotisDrakatos/T0rlib4j: T0rlib4j is a Java controller library for Tor.

The issue is that I am completely stumped as to how I would go around using these libraries, as well as how I would get JSoup access the web via Tor. I saw someone ask a similar question and the answerer said that I don’t even need any custom libraries/bindings for Tor and that I can just use a Socks5s protocol, which just adds to my confusion.

Where do I start? Are there any books or tutorials for learning about these things?

When Tor is started in default configuration, it opens SOCKS port 9050.
(With SocksPort in torrc file Tor can be configured to use different port)
Which means that you can start Tor and run this program:

package com.example.jsoup_tor;
import java.net.Proxy;
import java.net.InetSocketAddress;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

import java.io.IOException;

public class JsoupTor {
    public static void main(String[] args) {
        try {
            Proxy proxy = new Proxy(Proxy.Type.SOCKS, new InetSocketAddress("127.0.0.1", 9050));
            Document doc = Jsoup.connect("https://2ip.ru").proxy(proxy).get();
            String ip = doc.select("div.ip > span").first().html();
            System.out.printf("Your IP is: %s\n", ip);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

(I’m not Java programmer, so this code may be not perfect :slight_smile: )

1 Like

Such a fantastic solution, thank you sir. One question I had is, why do you connect to the localhost ip, out of curiosity?

This is where Tor process listens for incoming SOCKS connections.
You need to have it running for this code to work properly.
It is possible to have Tor process on different computer, but it requires additional configuring steps.

1 Like

Hate to be that guy, but I have one more question – if we can connect to Tor this easily, what is the point of those other libraries? Apologies for asking so much questions :slight_smile:

Besides opening SOCKS port, Tor process can also open controller port, which allows to have better control over Tor behaviour.
For example, it is possible to change IP with controller command.
Communicating with controller port manually may be inconvenient, so library developers made abstraction layers to help with this task.

1 Like

Running your code, I get the following exception:

java.net.SocketException: Connection refused
	at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
	at java.base/java.net.Socket.connect(Socket.java:633)
	at java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:178)
	at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:498)
	at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:603)
	at java.base/sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264)
	at java.base/sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:378)
	at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:189)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1287)
	at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1128)
	at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:175)
	at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:142)
	at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:859)
	at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:829)
	at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:366)
	at org.jsoup.helper.HttpConnection.get(HttpConnection.java:353)
	at MainKt.main(Main.kt:120)
	at MainKt.main(Main.kt)

Any idea why?

Because program can’t connect to 127.0.0.1:9050.
Here are the possible reasons:

  1. Tor process is not launched.
  2. Tor is configured to not use SocksPort.
  3. Tor is configured to use SocksPort different from 9050.
  4. Tor is configured correctly, but firewall is blocking connection.

I fixed it by changing the port to 9150.

1 Like