aws datasync cause double trigger for lambda event

358 Views Asked by At

I use AWS DataSync service to sync two buckets A and B The DataSync defined to check integrity during transfer and copy only new data to another bucket (B).

I have lambda function that defined to trigger on new files from bucket B

All files that created via DataSync triggers Lambda two times

I created cloudwatch dashboard to count the files that triggered lambda event and I see it double enter image description here The event for two trigger s is the same "PUT"

 'event_time': '2023-02-27T12:46:48.926Z', 'event_source': 'aws:s3', 'aws_region': 'eu-west-1', 'event_name': 'ObjectCreated:Put'}

The time stamp almost the same with 2-3 seconds difference

I set number of attempts to 0 I set event timeout lees then lambmda timeout

Nothing helped

Anyone faced same issue and solved it ?

1

There are 1 best solutions below

1
On BEST ANSWER

Edit:

After working with AWS Support Engineer from DataSync, we confirmed that it's a thing by design of how DataSync works.

DataSync treats S3 like any other file systems. Creates file (that 0 Byte object) then uploads data to it (the actual file size object) but since S3 is object storage, it doesn't support in place edits. So there's this second put again.

"When using object versioning in Amazon S3, running a DataSync task once might create more than one version of an Amazon S3 object."

That's for versioning enabled bucket but we confirmed for versioning disabled bucket as well.

https://docs.aws.amazon.com/datasync/latest/userguide/create-s3-location.html#create-s3-location-considerations

Unfortunately S3 Event notifications does not have ability to filter/exclude if object size is 0 Byte. So the conclusion for my case was I will have to find workarounds in my downstream processes, e.g. exclude those messages with 0 Byte object size.


I'm not able to comment at the moment so I'm just posting here in the answer section (I'll try to complete this as an answer as I do my troubleshooting).

I had the similar issue but it is with SQS as the event notification destination. I enabled the S3 server access log and queried for the requests history, I found that the DataSync job for some reasons made two REST.PUT.OBJECT request, and I also noticed that in the second request the objectsize is actually 0, but I think the duplicated request will definitely make S3 deliver the event notification twice. I'm still troubleshooting it now. enter image description here