Kafka S3 source connector: How to avoid data loss when a node is down?

463 Views Asked by At

I have a system that runs on 3 AWS nodes and uses Kafka S3 source connector for reading data from an S3 bucket. I am streaming data events to the S3 buckets, do some processing on these events and write the processed data back to S3 bucket. In order to test the system durability, I am deliberately stopping one of the nodes for a couple of minutes while the data is being streamed, and then bring it back up. I sporadically experience data loss of a single event. I tried to find a pattern in this data loss by stopping the node that runs specific broker/connector replica, the node that runs the my data processing module, the node that runs the broker that is the leader of specific topics (e.g. __consumer_offsets). Unfortunately, none of my attempts brought me to a situation where I can reproduce this issue 100% of the times (rather than experience it sporadically).

I get the following errors in the S3 connector log (file-00036.in is the data event that is lost in this case):

WARN Failed to load committed offset for object file s3://metro-bucket-221hsbomum/in/file-00036.in. Previous offset will be ignored. Error: Failed to fetch offsets. (io.streamthoughts.kafka.connect.filepulse.source.DefaultFileRecordsPollingConsumer) [task-thread-s1-ilcera-0] ERROR Failed to get object metadata from Amazon S3. Error occurred while making the request or handling the response for s3://metro-bucket-221hsbomum/in/file-00036.in: {} (io.streamthoughts.kafka.connect.filepulse.fs.AmazonS3Storage) [task-thread-s1-ilcera-0]

Any idea why this problem occurs and how to avoid it? Alternatively, suggestions for possible things I can try in order to get to a situation in which I can consistently reproduce this issue can be very helpful too.

0

There are 0 best solutions below