I thought this topic would be easy to find information on, but I was mistaking.
I am building a terraform with PaloAlto instances which require user data and curl on https to verify user data was applied.
When I specify security group on the public interface as Inbound/Outbound https, ssh 0.0.0.0/0 it works like a charm.
But I need to tighten the security and allow Inbound only from certain CIDRs.
What are the requirements for EC2 connectivity for successful user data bootstrap with tightest security group setup possible?
Thanks for your suggestions
It depends on what you mean by "user data bootstrap."
If you are referring to the instance's user data, that is fetched from the instance metadata service, which requires no rules because its IP address, 169.254.169.254 is subect to special rules.
User data (in this sense) is passed to the EC2 service API when the
RunInstancesAction is called, to request that the instance is launched. The EC2 service stores the user data so that the instance can fetch it from the instance metadata service after it starts. This prevents the instance from needing to fetch it externally.If this is what you are referring to, no security group configuration is necessary.
See also What's Special About 169.254.169.254?
But, since you mentioned PaloAlto, it looks like you might also need access to an S3 bucket, based on this:
A VPC Endpoint for S3 allows you to not only access S3 (within the same region as the endpoint) without going out an Internet Gateway, but also to restrict the instances using the VPC endpoint to accessing only specific buckets and performing specific actions. This is the only effective way to control restricted access to S3 as a whole, because otherwise it must be accessed via the Internet and its IP addresses are not static and thus can't be referenced in a security group. Even if IP-based restriction were possible, it would provide no meaningful security since data exfiltration could be accomplished by a malicious user, using their own bucket, since you'd be allowing connectivity to all of S3. There is no correlation of IP addresses to specific buckets.