Accessing google cloud bucket via FS Crawler (elasticsearch)

239 Views Asked by frankmurphy At 08 August 2025 at 18:36

The project I am currently working on needs a search engine to search a couple of 10.000 pdf files. When the user searches via the website for a certain keyword, the search engine will return a snippet of the pdf files matching his search criteria. The user then has the option to click on a button to view the entire pdf file.

I figured that the best way to do this was using elasticsearch + fscrawler (https://fscrawler.readthedocs.io/en/fscrawler-2.7/). Running some tests today and was able to crawl to a folder on my local machine.

For serving the PDF files (via a website), I figured I could store the PDF files in a google cloud storage and then use the link of the google cloud storage to let the users view the pdf files. However, FS Crawler does not seem to be able to access the bucket. Any tips or ideas on how to solve this. Feel free to criticize the work method described above. If there are better ways to make the users of the website access the PDF files, I would love to hear it.

Thanks in advance and kind regards!

Original Q&A

There are 1 best solutions below

ilvar On 10 December 2021 at 20:37

You can use s3fs-fuse to mount s3 bucket into your file system and then use normal Local FS crawler.

Alternatively, you can fork fscrawler and implement a crawler for s3 similar to crawler-ftp.

Accessing google cloud bucket via FS Crawler (elasticsearch)

There are 1 best solutions below

Related Questions in ELASTICSEARCH

Related Questions in PDF

Related Questions in GOOGLE-CLOUD-PLATFORM

Related Questions in FSCRAWLER

Trending Questions

Popular # Hahtags

Popular Questions