How can I hide a custom origin server from the public when using AWS CloudFront?

5.2k Views Asked by At

I am not sure if this exactly qualifies for StackOverflow, but since I need to do this programatically, and I figure lots of people on SO use CloudFront, I think it does... so here goes:

I want to hide public access to my custom origin server.

CloudFront pulls from the custom origin, however I cannot find documentation or any sort of example on preventing direct requests from users to my origin when proxied behind CloudFront unless my origin is S3... which isn't the case with a custom origin.

What technique can I use to identify/authenticate that a request is being proxied through CloudFront instead of being directly requested by the client?

The CloudFront documentation only covers this case when used with an S3 origin. The AWS forum post that lists CloudFront's IP addresses has a disclaimer that the list is not guaranteed to be current and should not be relied upon. See https://forums.aws.amazon.com/ann.jspa?annID=910

I assume that anyone using CloudFront has some sort of way to hide their custom origin from direct requests / crawlers. I would appreciate any sort of tip to get me started. Thanks.

3

There are 3 best solutions below

4
On

I would suggest using something similar to facebook's robots.txt in order to prevent all crawlers from accessing all sensitive content in your website.

https://www.facebook.com/robots.txt (you may have to tweak it a bit)

After that, just point your app.. (eg. Rails) to be the custom origin server.

Now rewrite all the urls on your site to become absolute urls like :

https://d2d3cu3tt4cei5.cloudfront.net/hello.html

Basically all urls should point to your cloudfront distribution. Now if someone requests a file from https://d2d3cu3tt4cei5.cloudfront.net/hello.html and it does not have hello.html.. it can fetch it from your server (over an encrypted channel like https) and then serve it to the user.

so even if the user does a view source, they do not know your origin server... only know your cloudfront distribution.

more details on setting this up here:

http://blog.codeship.io/2012/05/18/Assets-Sprites-CDN.html

0
On

Create a custom CNAME that only CloudFront uses. On your own servers, block any request for static assets not coming from that CNAME.

For instance, if your site is http://abc.mydomain.net then set up a CNAME for http://xyz.mydomain.net that points to the exact same place and put that new domain in CloudFront as the origin pull server. Then, on requests, you can tell if it's from CloudFront or not and do whatever you want.

Downside is that this is security through obscurity. The client never sees the requests for http://xyzy.mydomain.net but that doesn't mean they won't have some way of figuring it out.

0
On

[I know this thread is old, but I'm answering it for people like me who see it months later.]

From what I've read and seen, CloudFront does not consistently identify itself in requests. But you can get around this problem by overriding robots.txt at the CloudFront distribution.

1) Create a new S3 bucket that only contains one file: robots.txt. That will be the robots.txt for your CloudFront domain.

2) Go to your distribution settings in the AWS Console and click Create Origin. Add the bucket.

3) Go to Behaviors and click Create Behavior: Path Pattern: robots.txt Origin: (your new bucket)

4) Set the robots.txt behavior at a higher precedence (lower number).

5) Go to invalidations and invalidate /robots.txt.

Now abc123.cloudfront.net/robots.txt will be served from the bucket and everything else will be served from your domain. You can choose to allow/disallow crawling at either level independently.

Another domain/subdomain will also work in place of a bucket, but why go to the trouble.