Strange (Unicode?) encoding in PDF

318 Views Asked by At

I have a certain PDF file, in Hebrew, that's shown correctly, but when copy-pasting it's Gibberish. Using PDF Miner and 'xxd', I can get encoding very similar to Unicode, but with some shift.

The Hebrew word 'מגרסת', which is {d79e d792 d7a8 d7a1 d7aa} in Unicode, is encoded here as {c39e c392 c3a8 c3a1 c3aa}.

Is it a known encoding?

Of course, I can write a small routine that would change all the c3 prefixes to d7, but I'd rather use 'iconv', if it's possible.

0

There are 0 best solutions below