Attempts to parse bencode / torrent file in R

325 Views Asked by At

I wish I could parse torrent files automatically via R. I tried to use R-bencode package:

library('bencode')
test_torrent <- readLines('/home/user/Downloads/some_file.torrent', encoding = "UTF-8")
decoded_torrent <- bencode::bdecode(test_torrent)

but faced to error:

Error in bencode::bdecode(test_torrent) : 
  input string terminated unexpectedly

In addition if I try to parse just part of this file bdecode('\xe7\xc9\xe0\b\xfbD-\xd8\xd6(\xe2\004>\x9c\xda\005Zar\x8c\xdfV\x88\022t\xe4գi]\xcf'), I get

Error in bdecode("\xe7\xc9\xe0\b\xfbD-\xd8\xd6(\xe2\004>\x9c\xda\005Zar\x8c\xdfV\x88\022t\xe4գi]\xcf") : 
  Wrong encoding '�'. Allowed values are i, l, d or a digit.

Maybe there are another ways to do it in R? Or probably I can insert another language code in Rscript? Thanks in advance!

2

There are 2 best solutions below

0
On

There seem to be several issues here.

Firstly, your code should not treat torrent files as text files in UTF-8 encoding. Each torrent file is split into equally-sized pieces (except for the last piece ; )). Torrents contain a concatenation of SHA1 hashes of each of the pieces. SHA1 hashes are unlikely to be valid UTF-8 strings.

So, you should not read the file into memory using readLines, because that is for text files. Instead, you should use a connection:

test_torrent <- file("/home/user/Downloads/some_file.torrent")
open(test_torrent, "rb")
bencode::bdecode(test_torrent)

Secondly, it seems that this library is also suffering from a similar issue. As readChar that it makes use of, also assumes that it's dealing with text. This might be due to recent R version changes though seeing as the library is over 6 years old. I was able to apply a quick hack and get it working by passing useBytes=TRUE to readChar.

https://github.com/UkuLoskit/R-bencode/commit/b97091638ee6839befc5d188d47c02567499ce96

You can install my version as follows:

install.packages("devtools")
library(devtools)
devtools::install_github("UkuLoskit/R-bencode")

Caveat lector! I'm not a R programmer :).

0
On

It might be that the torrent file is somehow corrupted.

A bencode value must begin with the character i (for integers), l (for lists), d (for dictionaries) or a number (for the length of a string).

The example string ('\xe7\xc9...'), doesn't start with any of those characters, and hence it can't be decoded.

See this for more info on the bencode format.