I have been trying for ages to get this hashing thing for BitTorrent to work in Java but it always becomes wrong.
I have narrowed it down to a few lines of code where I'm 99% sure the problem is:
Bencode bencode = new Bencode(Charset.forName("UTF-8"));
byte[] fileBytes = new byte[33237];
Map<String, Object> dict = bencode.decode(fileBytes, Type.DICTIONARY);
Map infoMap = (Map) object.get("info");
ByteArrayOutputStream baos = new ByteArrayOutputStream();
BencodeOutputStream bos = new BencodeOutputStream(baos);
bos.writeDictionary(infoMap);
byte[] hash = DigestUtils.sha1(baos.toByteArray());
I have hardcoded the size of the array just to make sure the issue is not caused by a bunch of zeroes hanging around.
I have tried with both UTF-8
and US-ASCII
.
I have tried using two different libraries for the bencoding so it's probably not there where the problem's at.
Edit: From the spec it seems that the info dict should be urlencoded as the info_hash. So I tried writing out the dictionary into a ByteArrayOutputStream
and then do the sha1 hashing on the byte[]
that ByteArrayOutPutStream
is holding.
Will the DigestUtils.sha1
method provide a URL encoder? Can't find any information on that.
The problem, as Encombe pointed out, was with the encoding. In the Bencode specification it talks about byte strings and this seems to point to it just being a stream of data without any encoding.
Both of the libraries I looked at converted all byte strings to some encoding so I wrote a Bencode library that only did the conversion when specifically asked to.
The code above is basically correct but here is the client code I am using now: