I can not detect blank page in pdf file. I have searched internet for it but could not find a good solution.
Using Itextsharp I tried with page size, Xobjects. But they do not give exact result.
I tried
if(xobjects==null || textcontent==null || size <20 bytes )
then "blank"
else
not blank
But maximum time it returns wrong answer. I have used Itextsharp
The code is below... I am using Itextsharp Librabry
For xobjects
PdfDictionary xobjects = resourceDic.GetAsDict(PdfName.XOBJECT);
//here resourceDic is PdfDictionary type
//I know that if Xobjects is null then page is blank. But sometimes blank page gives xobjects which is not null.
For contentstream
RandomAccessFileOrArray f = reader.SafeFile;
//here reader = new PdfReader(filename);
byte[] contentBytes = reader.GetPageContent(pageNum, f);
//I have measured the size of contentbytes but sometimes it gives more than 20 bytes for blank page
For textcontent
String extractedText = PdfTextExtractor.GetTextFromPage(reader, pageNum, new LocationTextExtractionStrategy());
// sometimes blank page give a text more than 20 char length .
I suspect you have tried .Trim() on your strings, so I won't suggest that on it's own.
What is the actual contents of the 20+ char length strings in the blank? I suspect it is just new line characters (like what happens when people press enter 10+ times just to get a new page rather than inserting a page-break), in which case:
Let us know what the output contents is after this.
Another possibility is that it's blank text with non-breaking spaces and other characters that aren't actually spaces, you'll need to find and replace these manually.. at which point I would instead suggest that you actually just use a regex match for [0-9,a-z,A-Z] and use that to determine if your page is blank or not.