Possible to access an AWS public dataset using Cyberduck?

1.2k Views Asked by At

Cyberduck version: Version 7.9.2

Cyberduck is designed to access non-public AWS buckets. It asks for:

  • Server
  • Port
  • Access Key ID
  • Secret Access Key

The Registry of Open Data on AWS provides this information for an open dataset (using the example at https://registry.opendata.aws/target/):

  • Resource type: S3 Bucket
  • Amazon Resource Name (ARN): arn:aws:s3:::gdc-target-phs000218-2-open
  • AWS Region: us-east-1
  • AWS CLI Access (No AWS account required): aws s3 ls s3://gdc-target-phs000218-2-open/ --no-sign-request

Is there a version of s3://gdc-target-phs000218-2-open that can be used in Cyberduck to connect to the data?

2

There are 2 best solutions below

0
On

If the bucket is public, any AWS credentials will suffice. So as long as you can create an AWS account, you only need to create an IAM user for yourself with programmatic access, and you are all set.

No doubt, it's a pain because creating an AWS account needs your credit (or debit) card! But see https://stackoverflow.com/a/44825406/1094109 and https://stackoverflow.com/a/44825406/1094109

I tried this with s3://gdc-target-phs000218-2-open and it worked:

roda-gdc-target-phs000218-2-open

For RODA buckets that provide public access to specific prefixes, you'd need to edit the path to suit. E.g. s3://cellpainting-gallery/cpg0000-jump-pilot/source_4/ (this is a RODA bucket maintained by us, yet to be released fully)

NOTE: The screenshots below show a different URL that's no longer operational. The correct URL is s3://cellpainting-gallery/cpg0000-jump-pilot/source_4/

roda-cellpainting-gallery

0
On

No, it's explicitly stated in the documentation that

You must obtain the login credentials [in order to connect to Amazon S3 in Cyberduck]