Wildcard folder listing with gsutil

8.1k Views Asked by At

I am trying to list GCS folders that start with a fixed string followed by alphanumeric character. I don't want to do a recursive listing. When I tried following

Lets say we have following folder structure (I know there is no concept internally there is no concept of folder. It is just path prefix)

gs://somebucket/monitor/a
gs://somebucket/monitor/a/a1.log.gz
gs://somebucket/monitor/a/a2.log.gz
gs://somebucket/monitor/b
gs://somebucket/monitor/b/b1.log.gz
gs://somebucket/monitor/b/b2.log.gz
gs://somebucket/monitor/c
gs://somebucket/monitor1/x
gs://somebucket/monitor1/y
gs://somebucket/monitor1/z

In output what I want is

gs://somebucket/monitor
gs://somebucket/monitor1

I have tried following

$ gsutil ls gs://somebucket/monitor*

And

$ gsutil ls gs://somebucket/monitor**

But neither give the required output

Is there a way in gsutil to achieve the desired output

2

There are 2 best solutions below

1
On BEST ANSWER

gsutil will only list objects when using the ** wildcard, meaning that unless there's an object at the path monitor in somebucket, it won't just print gs://somebucket/monitor. Given this, there are a couple of approaches either using the JSON API directly (supplying the desired prefix and using "/" as the delimiter), or using gsutil without the ** wildcard, doing some extra processing of strings via grep/Python/<your scripting tool of choice here>.

A quick example of a script that would do this:

# Say I want the objects starting with "201", but have others:
$ gsutil ls gs://my-bucket/**
gs://my-bucket/other-thing
gs://my-bucket/2015/01/01/foo.jpg
gs://my-bucket/2016/12/25/christmas.jpg

$ export PATTERN="gs://my-bucket/201"
$ gsutil ls "$(python -c "print \"${PATTERN}\"[0:\"${PATTERN}\".rfind('/')]")" | grep -o "$PATTERN[^/]*"
gs://my-bucket/2015
gs://my-bucket/2016
1
On

It is possible that you're using zsh as your shell. There's something about the shell trying to search locally for it before sending to gsutil. Try gsutil ls 'gs://somebucket/monitor*' and that should work (Note single quotes).