Multibyte characters reading problem in IronPdf

277 Views Asked by At

I am trying IronPDF. I want to insert PDF metadata to database which I read with IronPDF. However, some "ı" characters in the metadata are not read with IronPDF. Spaces are left in place of these characters. Here is my code sample:

var md = PdfDocument.FromFile("___PATH OF PDF FILE___");
var article_title = md.MetaData.Title;

When I copy paste string to Notepad++ it gives a result like this:

enter image description here

And here is the screenshot of application view:

enter image description here

Is there a way to solve this problem or is this a bug of IronPDF? If everything goes well, of course, I think of buying. But of course, if it fails on the first try, continue to iTextSharp.

EDIT: First of all, I apologize for Windows, which made me surprised. I struggled to get a new system up all day and unfortunately it's still visual studio etc. not to be installed. I added one of the files I had problems with in the below and the IronPDF version appears as 2019.7.0.0.

PDF file: https://yadi.sk/d/HwP9JWRWTzMlSA

2

There are 2 best solutions below

0
On BEST ANSWER

First of all, since you haven't provided us with a sample PDF to work with; I've google some Turkish PDF documents having metadata with Turkish characters. This is the file that I came up with: link enter image description here As you can see above the Author metadata field has ı Turkish character.

Then I created a dotnet fiddle in order to test this file using IronPDF (with the latest available version - since you haven't specified any): sample using IronPDF

The output from this sample is ElifCakroglu which is showing the exact same symptom when copied to Notepad++: enter image description here

Playing with the encodings did not help resolving this issue. So I created another dotnet fiddle to test your alternative solution which was iTextSharp: sample using iTextSharp

This time everything was working as it should be: ElifCakıroglu

Note: I've also tried creating a Word 2016 document and saving it as a PDF then using that file with the above samples and both of them did not work (not accepting as a valid PDF) for some reason. After that I tried and online PDF document validator, but the file was fine. Then I used an online converter to change the PDF version with the default settings and used the output PDF with both samples and the surprising thing is that both of them worked correctly.

My conclusion is that iTextSharp is working consistently with both documents having metadata with Turkish characters present, while IronPDF works correctly 50% of the time.

0
On

I believe that this issue is resolved and can be tested in the 2020.9 release branch of IronPdf.

https://www.nuget.org/packages/IronPdf/