using Gcs match glob parameter for fetching specific files under a subdirectory in python

119 Views Asked by At

I'm trying fetching specific files from given bucket:

my_bucket
  dirA
    dirX
      file1.json
      file2.json
      file1.csv

    dirY
      file2.csv

  dirZ
    dirX
      file3.json
      file3.csv

Using python's sdk "match_glob" parameter, I'd like to fetch only the files which is conform to the **/dirX/**.json pattern.

Ie: I'd like to get the dirA/dirX/file1.json, dirA/dirX/file2.json & dirZ/dirX/file3.json files.

Trying the **/dirX/**.json pattern yielded empty result.

What is wrong with that pattern?

Thanks in advance!

1

There are 1 best solutions below

0
Sathi Aiswarya On

You can try using this pattern **/*.json instead of **/dirX/**.json as it will return the empty results because dirX is not a directory on the root of the bucket, but rather a subdirectory of both dirA and dirZ.

You can use the match_glob parameter like **/*.json, this pattern will fetch all .json files in the dirX subdirectory of any directory in the bucket.

When the matchGlob query parameter is set to a glob pattern, the objects list operation only returns objects that match the glob pattern in items[]. You can check this List objects and prefixes using glob