I have a certain PDF file, in Hebrew, that's shown correctly, but when copy-pasting it's Gibberish. Using PDF Miner and 'xxd', I can get encoding very similar to Unicode, but with some shift.
The Hebrew word 'מגרסת', which is {d79e d792 d7a8 d7a1 d7aa}
in Unicode, is encoded here as {c39e c392 c3a8 c3a1 c3aa}
.
Is it a known encoding?
Of course, I can write a small routine that would change all the c3
prefixes to d7
, but I'd rather use 'iconv', if it's possible.