PDF Readability check programmatically

244 Views Asked by At

I am working on a tool which will identify those PDF files (Scanned Documents in pdf forms) which are not readable for human eye i mean they can be blur or not clear (Less DPI). This tool is needed because there are millions of files and Its very difficult for us to open file one by one and make sure if its readable/clear What i have tried: I used spire.pdf library, using this library i am extracting images from pdf and check DPI of each image and compare its dpi with 150 dpi, The 150 DPI is the required and standard DPI for us. So i extract images from pdf and compare each image's dpi with standard if image's dpi is less than standard i mark that "Not clear Image" What i am facing problem after the above solution There are images in PDF of which DPI is less than 150 but they are clear, and there are some images of witch DPI is Good and tool is mark it Good but they are blur or not clear.

Have any of you worked on same requirement or if you have any suggestion i will grateful.

1

There are 1 best solutions below

0
On

We can achieve this requirement by converting PDF to image file using our PdfViewer library and then identify blur image using OpenCVSharp open source library. Please find the below sample which illustrates the same from below,

Identify blur image after converting PDF to image

Note: The smaller value (closer to zero), the result of CalculateBlurriness() is the sharper image.

Syncfusion do not have any direct support/straight solution to achieve this requirement. This is just a suggestion to identify the blurry images and the result may vary based on the image files.