C# Extract Text by using PdfSharp return unreadable content

2.6k Views Asked by At

I managed to extract text from PDF version 1.2 by using PdfSharp as refer to this link

My code to extract text

private string ExtractText(CObject cObject, ref string pdfcontentstr)
    {
        if (cObject is COperator)
        {
            var cOperator = cObject as COperator;
            if (cOperator.OpCode.Name == OpCodeName.Tj.ToString() ||
                cOperator.OpCode.Name == OpCodeName.TJ.ToString())
            {
                foreach (var cOperand in cOperator.Operands)
                {
                    ExtractText(cOperand, ref pdfcontentstr);
                }
            }
        }
        else if (cObject is CSequence)
        {
            var cSequence = cObject as CSequence;
            foreach (var element in cSequence)
            {
                ExtractText(element, ref pdfcontentstr);
            }
        }
        else if (cObject is CString)
        {
            var cString = cObject as CString;
            pdfcontentstr = pdfcontentstr + ";" + cString.Value;
        }
        return pdfcontentstr;
    }

But when i try to extract PDF version 1.3 (with same content), the program return unreadable content, example:

0%0O0R0F0N00%0

The actual content in PDF file: Block B

Anyone can help? Thanks in advance.

0

There are 0 best solutions below