I managed to extract text from PDF version 1.2 by using PdfSharp as refer to this link
My code to extract text
private string ExtractText(CObject cObject, ref string pdfcontentstr)
{
if (cObject is COperator)
{
var cOperator = cObject as COperator;
if (cOperator.OpCode.Name == OpCodeName.Tj.ToString() ||
cOperator.OpCode.Name == OpCodeName.TJ.ToString())
{
foreach (var cOperand in cOperator.Operands)
{
ExtractText(cOperand, ref pdfcontentstr);
}
}
}
else if (cObject is CSequence)
{
var cSequence = cObject as CSequence;
foreach (var element in cSequence)
{
ExtractText(element, ref pdfcontentstr);
}
}
else if (cObject is CString)
{
var cString = cObject as CString;
pdfcontentstr = pdfcontentstr + ";" + cString.Value;
}
return pdfcontentstr;
}
But when i try to extract PDF version 1.3 (with same content), the program return unreadable content, example:
0%0O0R0F0N00%0
The actual content in PDF file: Block B
Anyone can help? Thanks in advance.