Wildcard folder listing with gsutil

8.1k Views Asked by At

I am trying to list GCS folders that start with a fixed string followed by alphanumeric character. I don't want to do a recursive listing. When I tried following

Lets say we have following folder structure (I know there is no concept internally there is no concept of folder. It is just path prefix)

gs://somebucket/monitor/a
gs://somebucket/monitor/a/a1.log.gz
gs://somebucket/monitor/a/a2.log.gz
gs://somebucket/monitor/b
gs://somebucket/monitor/b/b1.log.gz
gs://somebucket/monitor/b/b2.log.gz
gs://somebucket/monitor/c
gs://somebucket/monitor1/x
gs://somebucket/monitor1/y
gs://somebucket/monitor1/z

In output what I want is

gs://somebucket/monitor
gs://somebucket/monitor1

I have tried following

$ gsutil ls gs://somebucket/monitor*

And

$ gsutil ls gs://somebucket/monitor**

But neither give the required output

Is there a way in gsutil to achieve the desired output

2

There are 2 best solutions below

1
mhouglum On BEST ANSWER

gsutil will only list objects when using the ** wildcard, meaning that unless there's an object at the path monitor in somebucket, it won't just print gs://somebucket/monitor. Given this, there are a couple of approaches either using the JSON API directly (supplying the desired prefix and using "/" as the delimiter), or using gsutil without the ** wildcard, doing some extra processing of strings via grep/Python/<your scripting tool of choice here>.

A quick example of a script that would do this:

# Say I want the objects starting with "201", but have others:
$ gsutil ls gs://my-bucket/**
gs://my-bucket/other-thing
gs://my-bucket/2015/01/01/foo.jpg
gs://my-bucket/2016/12/25/christmas.jpg

$ export PATTERN="gs://my-bucket/201"
$ gsutil ls "$(python -c "print \"${PATTERN}\"[0:\"${PATTERN}\".rfind('/')]")" | grep -o "$PATTERN[^/]*"
gs://my-bucket/2015
gs://my-bucket/2016
1
DFeng On

It is possible that you're using zsh as your shell. There's something about the shell trying to search locally for it before sending to gsutil. Try gsutil ls 'gs://somebucket/monitor*' and that should work (Note single quotes).