I am converting pdf to text using poppler-utils and the pdftotext-function on Ubuntu. Unfortunately I keep running into a problem where some files are not converted decently.
A correctly converted file looks like this:
82 => '23:00 23:00 - 05:00 05:00 01:30',
83 => 'Page 1 of 5',
84 => 'Generated on Feb 05, 2023 17:11',
But some files result in something like this:
82 => 'WĂƌƚŝĂůK&&;ĞŶĐƌŽĂĐŚĞĚďLJ',
83 => 'ĚƵƚLJͿ',
84 => 'ϬϬ͗ϭϯͲϮϯ͗ϱϵ D',
Both documents are pdf version 1.4 and appear to have been encoded with the same software, so I'm at a loss, what is causing this problem.
Does anyone have a suggestion, what to try next?