First of all, I’d just like to emphasize – I am new to Tor and I have only very basic networking knowledge.
I am currently working on a simple web scraper using Jsoup, my goal with this scraper was for it to be anonymous so that the server you are accessing knows nothing about you.
I originally wanted to use a proxy, although I had no idea how to even get started with this, and most proxies seem to cost money and were closed source.
Then, I came to the conclusion that I could make the request via a Tor Onion Proxy.
The issue is that I am completely stumped as to how I would go around using these libraries, as well as how I would get JSoup access the web via Tor. I saw someone ask a similar question and the answerer said that I don’t even need any custom libraries/bindings for Tor and that I can just use a Socks5s protocol, which just adds to my confusion.
Where do I start? Are there any books or tutorials for learning about these things?
When Tor is started in default configuration, it opens SOCKS port 9050.
(With SocksPort in torrc file Tor can be configured to use different port)
Which means that you can start Tor and run this program:
package com.example.jsoup_tor;
import java.net.Proxy;
import java.net.InetSocketAddress;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import java.io.IOException;
public class JsoupTor {
public static void main(String[] args) {
try {
Proxy proxy = new Proxy(Proxy.Type.SOCKS, new InetSocketAddress("127.0.0.1", 9050));
Document doc = Jsoup.connect("https://2ip.ru").proxy(proxy).get();
String ip = doc.select("div.ip > span").first().html();
System.out.printf("Your IP is: %s\n", ip);
} catch (IOException e) {
e.printStackTrace();
}
}
}
(I’m not Java programmer, so this code may be not perfect )
This is where Tor process listens for incoming SOCKS connections.
You need to have it running for this code to work properly.
It is possible to have Tor process on different computer, but it requires additional configuring steps.
Hate to be that guy, but I have one more question – if we can connect to Tor this easily, what is the point of those other libraries? Apologies for asking so much questions
Besides opening SOCKS port, Tor process can also open controller port, which allows to have better control over Tor behaviour.
For example, it is possible to change IP with controller command.
Communicating with controller port manually may be inconvenient, so library developers made abstraction layers to help with this task.
java.net.SocketException: Connection refused
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.base/java.net.Socket.connect(Socket.java:633)
at java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:178)
at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:498)
at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:603)
at java.base/sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264)
at java.base/sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:378)
at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:189)
at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1287)
at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1128)
at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:175)
at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:142)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:859)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:829)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:366)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:353)
at MainKt.main(Main.kt:120)
at MainKt.main(Main.kt)