How do we check in uipath if PDF file is a 1st generation Document?

390 Views Asked by At

Is there some way or activity in UIPath where in we can check if a PDF file is a 1st generation Document? An idea or help would be much appreciated. Thank you.

1

There are 1 best solutions below

0
On

This is more of a hack than a proper solution but it should work: use the digitize activity in the IntelligentOCR package with an OCR that you know returns word confidences (I think Microsoft OCR does but double check). The Digitize activity will decide if it needs an OCR or not, and if no OCR is used (meaning it's a native document or first generation how you call it) then all OCRConfidences in the DOM will be -1.

There are two caveats to doing this:

  • the digitize may decide to use OCR on a native PDF as well in certain weird edge cases if it decides the document text is unreadable (for instance due to super weird custom fonts)
  • while currently not supported, the Digitize activity may at some point in the future do partial OCRs for instance when a native PDF contains an image with text. As with any "undocumented feature", use with caution, as it may break at any time in the future when upgrading to a new version