AWS credentials required for Common Crawl S3 buckets

569 Views Asked by At

I'm trying to get at the Common Crawl news S3 bucket, but I keep getting a "fatal error: Unable to locate credentials" message. Any suggestions for how to get around this? As far as I was aware Common Crawl doesn't even require credentials?

1

There are 1 best solutions below

2
On

From News Dataset Available – Common Crawl:

You can access the data even without a AWS account by adding the command-line option --no-sign-request.

I tested this by launching a new Amazon EC2 instance (without an IAM role) and issuing the command:

aws s3 ls s3://commoncrawl/crawl-data/CC-NEWS/

It gave me the error: Unable to locate credentials

I then ran it with the additional parameter:

aws s3 ls s3://commoncrawl/crawl-data/CC-NEWS/ --no-sign-request

It successfully listed the directories.