PDF conversion with poppler-utils: Is there a way to avoid decoding difficulties?

239 Views Asked by At

I am converting pdf to text using poppler-utils and the pdftotext-function on Ubuntu. Unfortunately I keep running into a problem where some files are not converted decently.

A correctly converted file looks like this:

  82 => '23:00 23:00 - 05:00 05:00 01:30',
  83 => 'Page 1 of 5',
  84 => 'Generated on Feb 05, 2023 17:11',

But some files result in something like this:

  82 => 'WĂƌƚŝĂůK&&;ĞŶĐƌŽĂĐŚĞĚďLJ',
  83 => 'ĚƵƚLJͿ',
  84 => 'ϬϬ͗ϭϯͲϮϯ͗ϱϵ D',

Both documents are pdf version 1.4 and appear to have been encoded with the same software, so I'm at a loss, what is causing this problem.

Does anyone have a suggestion, what to try next?

0

There are 0 best solutions below