I’m using Lambda to load data records into Kinesis and often want to add up to 500K records, I am batching these into chunks of 500 and using Boto's put_records
method to send them to Kinesis. I sometimes see failures due to exceeding the allowed throughput.
What is the best approach for retrying when this happens? Ideally I don’t want duplicate messages in the data stream, so I don’t want to simply resend all 500 records, but I’m struggling to see how to retry only the failed messages. The response from the put_records
method doesn’t seem to be very useful.
Can I rely on the order of the response Records list being in the same order as the list I pass to putRecords?
I know I can increase the number of shards, but I’d like to significantly increase the number of parallel Lambda functions loading data to this Kinesis stream. We plan to partition data based on the source system and I can’t guarantee that multiple functions won’t write data to the same shard and exceed the allowed throughput. As a result, I don't believe that increasing shards will remove the need to a retry strategy.
Alternatively, does anybody know if KPL will automatically handle this issue for me?
Yes. You will have to rely on the order of response. Order of response records is same as request records.
Please check
putrecords
response, https://docs.aws.amazon.com/kinesis/latest/APIReference/API_PutRecords.html.To retry the failed records you have to develop your own retry mechanism. I have written retry mechanism in python using recursive function with incremental wait between retries in following way.
In above example, you can change
KINESIS_RETRY_COUNT
andKINESIS_RETRY_WAIT_IN_SEC
for your needs. Also you have to ensure that your lambda timeout is sufficient for retries.I am not sure about KPL, but from documentation It looks like it has it's own retry mechanism. https://docs.aws.amazon.com/streams/latest/dev/kinesis-producer-adv-retries-rate-limiting.html