I have 100 pdf stored in a location and I want to extract text from them and store in excel below is pdf image in this i want (stored in page1)
bid no,end date,item category,organisation name
needed
OEM Average Turnover (Last 3 Years),Years of Past Experience required,MSE Exemption for Years Of Experience
and Turnover,Startup Exemption for Years of Experience
and Turnover,Estimated Bid Value,EMD Required




Tika is one of the Python packages that you can use to extract the data from your PDF files.
In the example below I'm using
Tikaand regular expressions to extract these five data elements:Here is one way to write out the extracted data to a CSV file:
Here is the additional code that you asked for in the comments.
SPECIAL NOTE: I noted that some PDFs don't have an org_name, so you will have to figure out how to handle these with either a
N/A, None, or Null