Throtling in both sagemakerruntime and sagemakerfeaturestoreruntime

43 Views Asked by At

We have not set the rate limit values in both feature-store and sagemaker aws service but we are getting rate limit exceeded errors in very low RPS.

Case :-

Sagemaker : Current RPS to SageMaker service is 5 RPS. It goes to 12 RPS in fractions of millis at 8pm due to sale, which is not very high, but it's a sudden jump. At this point in time SageMaker calls give us error as below :

Message : Rate exceeded (Service: SageMakerRuntime, Status Code: 400, Request ID: 64a3ba3d-fc15-406b-b73a-ebc321e28ff8)

FeatureStore : Current RPS to FeatureStore service is 55 RPS. It goes to 100RPS in fractions of millis at 8pm due to sale, which is not something that feature-store can't handle. At this point in time FeatureStore calls gives us error as below:

Message : Rate exceeded (Service: SageMakerFeatureStoreRuntime, Status Code: 400, Request ID: 5b7341bd-21e1-4ec9-a2d6-c204244e9780).

Some other errors the has been observed while calling the feature-store service, though the quantum of these errors are very less like 3 times a day or so but they are there. Given below 2 errors.

Error: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Connection reset at software.amazon.awssdk.core.exception.SdkClientException

Error: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: The target server failed to respond at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:111).

Should we not worry about these 2 errors ?

This is how we configured the Feature-store and SageMaker client in our application.

Java version: 17 aws sdk version : 2.20.26

FeatureStore client config:

    ClientOverrideConfiguration clientOverrideConfiguration = ClientOverrideConfiguration.builder()
            .apiCallTimeout(Duration.of(featureStoreConfig.getApiCallAttemptTimeout(), ChronoUnit.MILLIS))
            .retryPolicy(RetryPolicy.builder().numRetries(0).build())
            .build();

    featureStoreRuntimeClient = SageMakerFeatureStoreRuntimeClient.builder()
            .overrideConfiguration(clientOverrideConfiguration)
            .credentialsProvider(DefaultCredentialsProvider.create())
            .region(Region.AP_SOUTHEAST_1).build();

Sagemaker client config :

    ClientOverrideConfiguration clientOverrideConfiguration = ClientOverrideConfiguration.builder()
            .apiCallTimeout(Duration.of(sageMakerConfig.getApiCallAttemptTimeout(), ChronoUnit.MILLIS))
            .retryPolicy(RetryPolicy.builder().numRetries(0).build())
    .build();

    sageMakerRuntimeClient = SageMakerRuntimeClient.builder()
            .overrideConfiguration(clientOverrideConfiguration)
            .credentialsProvider(DefaultCredentialsProvider.create())
            .region(Region.AP_SOUTHEAST_1)
            .build();

Another interesting observation is if we keep the RPS same as peak RPS for some amount of time. This rate limit exceeded error goes away. It seems this throttling only happens on sudden increase of RPS. How do we solve for this? Could 0 retry be the issue here, but why at the calls are being throttled at first place with such low RPS ?

0

There are 0 best solutions below