AWS MSK Connect inserting data to s3 without additional folder of {topic_name}

28 Views Asked by At

I am consuming a kafka topic (say, topic1) by using AWS MSK connect and sending to AWS s3.

The following is a simplified AWS MSK Connector config..

topics.dir=folder1/folder2
topics=topic1

The topic can be consumed and sent to s3 as expected.

The only problem is that an extra folder {topic_name} is created under s3 bucket, namely, folder1/folder2/topic1.

My question is: Is there any way to disable the creation of new folder per topic_name?

Any help is highly appreciated.

1

There are 1 best solutions below

0
EdbE On

Assuming you are using Confluent S3 sink connector, you cannot avoid having a topic name as part of a object's path.

Here is a code of a connector:

  private String fileKeyToCommit(String dirPrefix, long startOffset) {
    String name = tp.topic()
                      + fileDelim
                      + tp.partition()
                      + fileDelim
                      + String.format(zeroPadOffsetFormat, startOffset)
                      + extension;
    return fileKey(topicsDir, dirPrefix, name);
  }

As you can see, a connector unconditionally adds a topic name into a path of the object.

As a workaround, you can enable events when new objects are created on S3, and rename the objects in Lambda.