I am experimenting with xpdf (pdftotext) on a macOS Terminal. I use one language package (Japanese). Everything works fine if I call the executable like this (from the lib directory):
lib kelly$ ./p2t -enc UTF-8 jp.pdf
and my data structure
files/lib/pdftotext
files/lib/xpdfrc
files/lib/jp.pdf #file to convert
files/options/Enc/jp/ # Here I have the language package files
and the following edited xpdfrc configuration file:
#----- begin Japanese support package (2011-sep-02)
cidToUnicode Adobe-Japan1 ../options/Enc/jp/Adobe-Japan1.cidToUnicode
unicodeMap ISO-2022-JP ../options/Enc/jp/ISO-2022-JP.unicodeMap
unicodeMap EUC-JP ../options/Enc/jp/EUC-JP.unicodeMap
unicodeMap Shift-JIS ../options/Enc/jp/Shift-JIS.unicodeMap
cMapDir Adobe-Japan1 ../options/Enc/jp/CMap
toUnicodeDir ../options/Enc/jp/CMap
#----- end Japanese support package
the problem I have is to call 'pdftoext' from a different directory, for example from 'files'. In this case, the files that the configuration files is pointing to are not seen.
files kelly$ ./lib/p2t -enc UTF-8 ./lib/jp.pdf
I get the following error:
Syntax Error: Unknown character collection 'Adobe-Japan1'
And the generated file is garbage.
Any idea on how the configuration file needs to be changed?
I was able to solve a similar problem. I installed pdftotext with a brew cask.
The installation was done with the following command
and place the xpdfrc/language support packages in the following directory I did.
I downloaded the Japanese Language Pack from here. https://www.xpdfreader.com/download.html
The contents of xpdfrc are as follows