CGPDFStringGetBytePtr returning incorrect string while scanning pdf

170 Views Asked by At

I have one PDF and I am trying to scan PDF using CGPDFScanner. While scanning the pdf, when the word "file" is encountered, the CGPDFStringGetBytePtr API returns "\x02le". PDF is having Type1 font and no ToUnicodeMapping(CMap). Encoding dictionary is not present in the PDF hence using NSUTF8String encoding. However I have tried with all NSMacOSRomanStringEncoding, NSASCIIStringEncoding but had no luck. What can be the problem?

Thanks.

1

There are 1 best solutions below

1
On BEST ANSWER

The code \x02 corresponds to 'fi' string. The 'fi' sequence is drawn using a ligature this is why you have only one character code.
The correspondence between the code and the string is done in the font encoding. The font encoding contains a /Differences array that specifies the mapping between code \x02 and the sequence 'fi'