curl an .onion url over an http proxy does not return expected source

4.6k Views Asked by At

The problem

I'm testing an HTTP proxy that is wrapping a SOCKS proxy (TOR). It works ok for normal URLs but I'm getting strange results with some .onion addresses.

In this example, I'm pointing at "the hidden wiki". The output looks like garbage:

$ curl --proxy localhost:8118 http://kpvz7ki2v5agwt35.onion/

m�AO�@�����ۑp��ĖPbj

Background

Using the torch hidden service works ok:

$ curl --proxy localhost:8118 http://xmh57jrzrnw6insl.onion/

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>TORCH: Tor Search!</title>...

Similarly, normal URLs seem ok:

$ curl --proxy localhost:8118 https://check.torproject.org/ | grep Congratulations

<img alt="Congratulations. Your browser is configured to use Tor." src="/images/tor-on.png">
Congratulations. Your browser is configured to use Tor.<br>

The proxy is created with polipo with the following configuration:

proxyName = "localhost"
proxyAddress = "127.0.0.1"
proxyPort = 8118

allowedClients = 127.0.0.1
allowedPorts = 1-65535

cacheIsShared = false
chunkHighMark = 67108864

socksParentProxy = "localhost:9050"
socksProxyType = socks5


diskCacheRoot = ""
localDocumentRoot = ""

disableLocalInterface = true
disableConfiguration = true
disableVia = true

dnsUseGethostbyname = yes

maxConnectionAge = 5m
maxConnectionRequests = 120

serverMaxSlots = 8
serverSlots = 2

tunnelAllowedPorts = 1-65535

Possible causes

My thoughts on a possible cause:

  1. The server responding with garbage as some kind of anti-web-crawler measure.
  2. There something wrong with the way I'm handling the response.
  3. Polipo is messing it up.
  4. Something else...

Thoughts?

1

There are 1 best solutions below

0
On

There are several issues. First, you need to make sure that ntp is running and synced. If not, you will have precisely this problem.

If I may ask, why aren't you just calling the socks port on 9050 that the Tor Project supplies?

Also, why not precisely specify which protocol you are interested in, socks 4, 4a, or 5? With curl, you could even specify that the dns is done through Tor. If you don't do that, then of course your DNS resolver won't know where the hidden services are. Put this line in your .curlrc file

proxy = socks5h://127.0.0.1:9050

and then

curl -v http://xmh57jrzrnw6insl.onion/

will return the page corectly, provided the Tor network isn't overloaded. Check in the Tor Browser to make sure the hidden services are up.