Searching inside the metadata of the PDF documents

125 Views Asked by Dundar At 07 June 2025 at 10:36

I have been using Google custom search API for the following task:

Search for certain keywords with "filetype:pdf"

This works fine as expected, however it only allows searching within the content of the PDF documents. However, I am trying to search within the Metadata of the PDF documents, or within the content stream of the PDF documents. I have searched a lot and I think there is no way to do this with Google. I was wondering if there is any other search engines that you think I can achieve what I want?

Thank you

Original Q&A

There are 1 best solutions below

Enok._.Seth On 07 October 2023 at 01:06

i find this on github but the repo was archived. It's using differents combinaisont and way, the script is not updated but i think if you use :

selenium PyPDF2 PyMuPDF json

and others regex techniques by modifing this script,

youn can get there.

https://github.com/TebbaaX/Katana

and with selenium PyPDF2 PyMuPDF bsf4:

https://pypi.org/project/PyMuPDF/

https://pypi.org/project/PyPDF2/

https://pypi.org/project/BeautifulSoup/

I don't know if this can help you, but logically you have to scrape files and run analyzes on them to extract the metadata ?

Searching inside the metadata of the PDF documents

There are 1 best solutions below

Related Questions in SEARCH-ENGINE

Related Questions in GOOGLE-CUSTOM-SEARCH

Related Questions in YAHOO-API

Related Questions in BING-API

Related Questions in SEARCH-ENGINE-API

Trending Questions

Popular # Hahtags

Popular Questions