I'm uploading a file using PutObject
which works, but how do I tell if the MD5 checksum has been verified?
var s3Client = new AmazonS3Client();
string base64Checksum;
using (var md5 = MD5.Create())
{
byte[] fileBytes = File.ReadAllBytes(filePath);
byte[] hash = md5.ComputeHash(fileBytes, 0, fileBytes.Length);
base64Checksum = Convert.ToBase64String(hash);
}
var putRequest = new PutObjectRequest()
{
BucketName = bucketName,
Key = objectKey,
FilePath = filePath,
ContentType = "application/txt",
MD5Digest = base64Checksum
};
await s3Client.PutObjectAsync(putRequest);
And in the response, ResponseMetadata.ChecksumAlgorithm
is set to NONE
and ChecksumValidationStatus
is NOT_VALIDATED
.
Does this mean the MD5 hash I've provided has not been validated?
And alternatively, if I set ChecksumAlgorithm
to ChecksumAlgorithm.SHA256
:
var putRequest = new PutObjectRequest()
{
// ...
ChecksumAlgorithm = ChecksumAlgorithm.SHA256
};
The checksum is calculated by AWS, but ChecksumAlgorithm
and ChecksumValidationStatus
still remains as the above.
And even if I calculate it myself and set it:
var putRequest = new PutObjectRequest()
{
// ...
ChecksumSHA256 = sha256Checksum
};
I still get ChecksumAlgorithm
set to NONE
and ChecksumValidationStatus
is NOT_VALIDATED
.
What am I doing wrong?
No, it has been validated.
MD5 checksum verification is done automatically by Amazon S3 based on a MD5 checksum that is sent via the
Content-MD5
header.This value can be generated by the SDK or provided as part of the
PutObject
request, however the key is that regardless of who provides the MD5 digest - the verification is done by AWS as clearly stated in the docs:If the
MD5Digest
provided is correct (which maps to theContent-MD5
header), thePutObjectRequest
succeeds without any exceptions, signalling that the MD5 digest has been verified successfully.S3 now guarantees that what you think you've uploaded is actually what has been uploaded. The MD5 of your local object matches the MD5 calculated by S3 - great.
Now, if the
MD5Digest
is incorrect (or the upload has been corrupted), the .NET SDK will throw this exception with error code ofBadDigest
:This is the .NET version of what happens, yet do note that the SDK is merely surfacing the S3 API's
400 Bad Request
error.And by default, the SDK will generate an MD5 digest for you. This isn't calculated for you if either of the below conditions are true:
DisableMD5Stream
is set totrue
MD5Digest
(the condition that is true for your code)You don't have to provide your own value.
If you set the
ChecksumAlgorithm
to the desired algorithm, the AWS SDK will calculate the checksum. This calculated checksum is then included in the request sent to Amazon S3.However, this checksum is to not be mistaken with the checksum that Amazon S3 generates after receiving the request. Amazon S3 uses its checksum for cross-referencing against the checksum that has been sent in the request.
With MD5 checksum verification aside, S3 announced support for 4 new checksum algorithms in Feb 2022, that can be used alongside the MD5 integrity check:
x-amz-checksum-crc32
x-amz-checksum-crc32c
x-amz-checksum-sha1
x-amz-checksum-sha256
As above, these checksums can also be calculated by the SDKs or provided by the user however once again - Amazon S3 checks the object against the provided checksum value and, if they do not match, Amazon S3 returns an error.
Same format as above: if your CRC32/CRC32C/SHA1/SHA256 checksum value is incorrect, you'll get an exception with the error code of
BadDigest
and a message relating to whichever checksum algorithm you used.All of this has nothing to do with the SDK.
The SDK either generates & sends the generated checksum value along with the right header name, just sends the checksum value that you've manually provided it with the right header name or just doesn't send the header (for no additional checksum verification).
So what is
ChecksumValidationStatus
?If you take a look at S3 API's response object - which all of the SDKs are basically clients for - it's not actually there. It's a .NET-SDK-specific concept regarding the additional checksum algorithms & is not related to MD5 checksum validation whatsoever.
The field is not related to Amazon S3's verification of the checksum value. That is denoted by the S3 response and in the .NET SDK's case: no exception = ✅ valid & verified checksum.
So, let's say we upload object
payroll.txt
with a (dummy) SHA256 checksum value ofa
. No exceptions are thrown so we know that S3 has validated that my object in transit has not been corrupted as their calculated checksum value forpayroll.txt
is alsoa
. We now are confident that S3 is truly storingpayroll.txt
as originally expected.On another device, we send a
GetObjectRequest
via the .NET SDK to downloadpayroll.txt
. We know that S3 is truly storingpayroll.txt
but how do we know that the .NET SDK has truly downloadedpayroll.txt
as intended?That's where the
ChecksumValidationStatus
comes into play, which should be checked on theGetObjectRequest
. The fact that it is even accessible on thePutObjectResponse
seems like a leaky abstraction to me.This is why even if we've specified an additional SHA256 checksum value to validate, the
PutObjectResponse
always has a status ofNOT_VALIDATED
. The SDK client doesn't even validate the checksum of the object on aPutObject
for it to even make sense to have a status for it.With that out of the way, the field is so that the SDK can validate that it's downloaded the right object & that it hasn't been corrupted on the way. As long as you've set
ChecksumMode
on the request toChecksumMode.ENABLED
, the SDK will obtain & populate the checksum fields e.g.GetObjectResponse.ChecksumSHA256
.ChecksumMode
maps to thex-amz-checksum-mode
header.Of course, you can then manaully verify this, but the SDK tries to help by aiming to change the status of
ChecksumValidationStatus
fromPENDING_RESPONSE_READ
(its initial value post GET) to eitherSUCCESSFUL
orINVALID
based on the hash it generates.It can only generate the hash of the downloaded object once it has been fully read i.e. on the closure of the
ResponseStream
(a standard .NETStream
).You can see this within the comment for the
ChecksumValidationStatus
enum based on the public source code:There seems to be a bug where the validation status on the
GetRequest
never changes fromPENDING_RESPONSE_READ
toSUCCESSFUL
or evenINVALID
once the steam is fully closed.My sample code that demonstrates this:
Output:
The 2nd:
should be:
You should depend on a
AmazonClientException
instead.I reached out to the AWS SDK for .NET team for confirmation that this field is currently unused:
In conclusion:
Amazon S3 can optionally validate the MD5 checksum for objects and/or one extra checksum value to ensure integrity of the object uploaded
The extra checksum values can be of the following algorithms: MD5, CRC32, CRC32C, SHA1 or SHA256
The checksum values can be provided by the user, or generated by the SDK depending on SDK configuration
No errors returned by the S3 API on the
PutObject
request means that S3 has verified the checksum of the object successafullyThe SDK implementation may offer the option of validating the checksum value that is returned by the API, as long as the SDK has been configured to obtain the checksum value from S3 by setting
x-amz-checksum-mode
toENABLED
The .NET SDKs
ChecksumValidationStatus
field is currently a field that shouldn't be used so catchAmazonClientException
s instead