Constant 503 errors from Elasticsearch inside a VPC when using Kinesis Firehose

1.7k Views Asked by At

I am using the Amazon Kinesis Data Generator to send data to a test Kinesis Firehose Stream for indexing in an Elasticsearch Service cluster.

The data generator sends a fairly basic json doc for processing, the stream element works fine, as does the Lambda transformation. I've verified and been able to test everything up to this point. It's only when the request is made to Elasticsearch inside my VPC that I get an error.

It works fine if I switch the pipeline to use a public Elasticsearch domain, but when I use the Elasticsearch domain inside my VPC, I get a 503 error. This is a consistent error on every single request so not a capacity issue.

Here's an example of the error I'm seeing. Just a generic 503. I'm not sure if this is coming from the load balancer or the target (Elasticsearch itself).

{
 "deliveryStreamARN": "arn:aws:firehose:eu-west-2:xxx:deliverystream/firehose-test",
 "destination": "arn:aws:es:eu-west-2:xxx:domain/elasticsearch-test",
 "deliveryStreamVersionId": 1,
 "message": "Error received from Elasticsearch cluster. <html><body><h1>503 Service 
 Unavailable</h1>\nNo server is available to handle this request.\n</body></html>",
 "errorCode": "ES.ServiceException",
 "processor": "arn:aws:lambda:eu-west-2:xxx:function:transform-test:$LATEST"
}

Other applications inside the VPC are able to use the Elasticsearch service without a problem. This seems to be something specific to Firehose.

I have read and re-read the docs and can't figure out why the same data pipeline works fine on a public ES domain, but not on the ES domain inside my VPC. I've double checked all roles, policies, security groups and subnets. Both Firehose and Elasticsearch are using the same VPC, Security Groups and Subnets. The inbound rules on the security group allow HTTPS. It all looks correct, but still getting errors and nothing being indexed in Elastic.

I've also read this AWS blog post about 15 times to no avail. Ingest streaming data into Amazon Elasticsearch Service within the privacy of your VPC with Amazon Kinesis Data Firehose

1

There are 1 best solutions below

0
On BEST ANSWER

So after a few days of pain I realised my mistake/quirk of Firehose set up.

My Elasticsearch cluster is Multi-AZ. When creating the Firehose, I was just letting it choose the default Security Groups based on the pre-existing Elasticsearch Domain. I only had one security group defined on the Firehose. It needs two if your Elasticsearch is in a VPC.

  • One for Firehose outbound
  • One for Elasticsearch inbound (you probably already have this)

Then they need to be joined together using the SG rules.

The Firehose Delivery Stream setup wizard will not warn you that you only have one security group and that this won't work. (Perhaps it would work for an Elasticsearch Domain on a single-AZ but I haven't tested.)

You must create the two required Security Groups before you create the Firehose Delivery Stream.

Create the first Security Group for the Firehose endpoint allowing HTTPS/433 outbound traffic. Then make sure the Elasticsearch Domain Security Group allows inbound HTTPS/433 traffic, specifically from the Firehose endpoint Security Group you just created.

Here is a diagram of what is needed (from this blog post: Ingest streaming data into Amazon Elasticsearch Service within the privacy of your VPC with Amazon Kinesis Data Firehose) enter image description here