I have a 900 MB file that I'd like to download to disk from S3 if it isn't already in place downloaded. Is there an easy way for me to only download the file if it isn't already in place? I know S3 supports querying MD5 checksum of files, but I'm hoping not to have to build this logic myself.
How do I download an S3 file only if it has changed?
6.7k Views Asked by Ztyx At
2
There are 2 best solutions below
0

I have used below code to download S3 files which have timestamp greater than the local folder timestamp. First it's check if any of the files in S3 folder have timestamp greater than the local folder timestamp. If yes then download those files only.
TransferManager transferManager = TransferManagerBuilder.standard().build();
AmazonS3 amazonS3 = AmazonS3ClientBuilder.standard().build();
Path location = Paths.get("/data/test/");
FileTime lastModifiedTime = null;
try {
lastModifiedTime = Files.getLastModifiedTime(location, LinkOption.NOFOLLOW_LINKS);
} catch (IOException e) {
e.printStackTrace();
}
Date lastUpdatedTime = new Date(lastModifiedTime.toMillis());
ObjectListing listing = amazonS3.listObjects("bucket", "test-folder");
List<S3ObjectSummary> summaries = listing.getObjectSummaries();
for (S3ObjectSummary os: summaries) {
if(os.getLastModified().after(lastUpdatedTime)) {
try {
String fileName="/data/test/"+os.getKey();
Download multipleFileDownload = transferManager.download(bucket, os.getKey(), new File(fileName));
while (multipleFileDownload.isDone() == false) {
Thread.sleep(1000);
}
}catch(InterruptedException i){
LOG.error("Exception Occurred while downloading the file ",i);
}
}
}
You can use AWS CLI's
s3 sync
command.According to this forum thread, you can use
sync
to synchronize only one file:It says: sync the given paths, exclude all files, but include
"File.txt"
- so it will sync only"File.txt"
under those given paths.Or with the Java SDK:
According to the javadoc, there is a
getObjectMetadata
method which will return information about an S3 object (file) without downloading it's contents.The method returns an
ObjectMetadata
object which can give you some useful information:getLastModified
method:getContentMD5
method:getETag
method: