I am using Document Understanding in UiPath to extract data from multiple pdf's. Each pdf file contains multiple copies of the same page which I cannot remove. Trouble is:
1.) The Regex Extractor is extracting data from all the pages of the pdf file. I only want the data from the first page of the pdf.
2.) It is also extracting other irrelevant data below it along with the required data.
I cannot remove the duplicate pages from the pdf file. So I cannot use the ML Extractor as it has a limit of 2 pages and 4mb size. Currently I am using Form Extractor and Regex Extractor to extract data and both of them are extracting data from all the pages of the pdf file.
Also for some data, it is also extracting other irrelevant data along with it (This happens only when I use Regex Extractor.). How can I solve these 2 problems?
Any help is appreciated!
I'd recommend using the Intelligent Form Extractor but note this has limitations on a Community License; so follow the structure below.
You might want to split your PDF before the Digitization so that you are only looking at Page 1 and you could always merge back after if required