spring-integration-aws - disable sync from sub folders in bucket

411 Views Asked by At

I am using Spring integration Aws in one of my projects to download the files from S3 bucket to local directory. I have specified the bucket location and downloading is working fine so far. The issue occurred, when i created a sub folder in the bucket to keep archived file (which have been processed/downloaded). The S3 synchronizer started downloading the sub folder as well. My expectations are to only sync the folder instead of sub folders in the bucket. I can see a flag in spring-integration-aws 0.5 release to disable this behavior.

<xsd:attribute name="accept-sub-folders" type="xsd:string">

but i am unable to find this in release 2.00.

Below is the code :

@Bean
public S3InboundFileSynchronizer s3InboundFileSynchronizer ()
{
    S3InboundFileSynchronizer s3InboundFileSynchronizer = new S3InboundFileSynchronizer (amazonS3);
    s3InboundFileSynchronizer.setDeleteRemoteFiles (false);
    s3InboundFileSynchronizer.setPreserveTimestamp (true);
    s3InboundFileSynchronizer.setRemoteDirectory (remoteBucket);
    ChainFileListFilter fileListFilter = new ChainFileListFilter ();
    fileListFilter.addFilter (new S3RegexPatternFileListFilter (remoteFilesExtension));
    fileListFilter.addFilter (new S3PersistentAcceptOnceFileListFilter (metadataStore (), metadataStoreKeyPrefix));
    return s3InboundFileSynchronizer;
}

and poller config :

@Bean
@InboundChannelAdapter(channel = "fileArchiveChannel", poller = @Poller(fixedRate = "100000", maxMessagesPerPoll = "-1"))
public S3InboundFileSynchronizingMessageSource s3InboundFileSynchronizingMessageSource ()
{
    S3InboundFileSynchronizingMessageSource messageSource = new S3InboundFileSynchronizingMessageSource (s3InboundFileSynchronizer ());
    messageSource.setAutoCreateLocalDirectory (true);
    messageSource.setLoggingEnabled (true);
    File location = new File (localDirectory);
    Assert.notNull (location, "Local directory is not available");
    messageSource.setLocalDirectory (location);

    ChainFileListFilter fileListFilter = new ChainFileListFilter ();
    fileListFilter.addFilter (new RegexPatternFileListFilter (remoteFilesExtension));
    fileListFilter.addFilter (new FileSystemPersistentAcceptOnceFileListFilter (metadataStore (), metadataStoreKeyPrefix));
    messageSource.setLocalFilter (fileListFilter);

    return messageSource;
}

Any way to stop syncing the sub folders with spring integration aws 2.00 ?

2

There are 2 best solutions below

0
On BEST ANSWER

To solve this issue, I have updated the regex pattern to exclude files which contains archive folder path for S3RegexPatterenFileListFilter. This pattern only allow the files with txt csv extensions but don't allow the paths with my archive folder name.

([^archive](\.(?i)(txt|csv))$)
3
On

As far as I know there is no sub-folder notation in the AWS S3 protocol: https://docs.aws.amazon.com/AmazonS3/latest/user-guide/using-folders.html.

This is an artificial approach to group objects with the same prefix.

When we get an object from S3 we have its key. So, you can configure an S3RegexPatternFileListFilter to skip those objects where their keys have your logical sub folder name.