Getting an anomaly score for every datapoint in SageMaker?

187 Views Asked by At

I'm very new to SageMaker, and I've run into a bit of confusion as to how to achieve the output I am looking for. I am currently attempting to use the built-in RCF algorithm to perform anomaly detection on a list of stock volumes, like this:

apple_stock_volumes = [123412, 465125, 237564, 238172]

I have created a training job, model, and endpoint, and I'm trying now to invoke the endpoint using boto3. My current code looks like this:

apple_stock_volumes = [123412, 465125, 237564, 238172]
def inference():
    client = boto3.client('sagemaker-runtime')
    
    body = " ".join(apple_stock_volumes)
    response = client.invoke_endpoint(
        EndpointName='apple-volume-endpoint',
        Body=body,
        ContentType='text/csv'
    )
    inference = json.loads(response['Body'].read())
    print(inference)

inference()

What I wanted was to get an anomaly score for every datapoint, and then to alert if the anomaly score was a few standard deviations above the mean. However, what I'm actually receiving is just a single anomaly score. The following is my output:

{'scores': [{'score': 0.7164874384}]}

Can anyone explain to me what's going on here? Is this an average anomaly score? Why can't I seem to get SageMaker to output a list of anomaly scores corresponding to my data? Thanks in advance!

Edit: I have already trained the model on a csv of historical volume data for the last year, and I have created an endpoint to hit.

Edit 2: I've accepted @maafk's answer, although the actual answer to my question was provided in one of his comments. The piece I was missing was that each data point must be on a new line in your csv input to the endpoint. Once I substituted body = " ".join(apple_stock_volumes) for body = "\n".join(apple_stock_volumes), everything worked as expected.

1

There are 1 best solutions below

7
On BEST ANSWER

In your case, you'll want to get the standard deviation from getting the scores from historical stock volumes, and figuring out what your anomaly score is by calculating 3 * standard deviation

Update your code to do inference on multiple records at once

apple_stock_volumes = [123412, 465125, 237564, 238172]
def inference():
    client = boto3.client('sagemaker-runtime')
    
    body = "\n".join(apple_stock_volumes). # New line for each record
    response = client.invoke_endpoint(
        EndpointName='apple-volume-endpoint',
        Body=body,
        ContentType='text/csv'
    )
    inference = json.loads(response['Body'].read())
    print(inference)

inference()

This will return a list of scores

Assuming apple_stock_volumes_df has your volumes and the scores (after running inference on each record):

score_mean = apple_stock_volumes_df['score'].mean()
score_std = apple_stock_volumes_df['score'].std()
score_cutoff = score_mean + 3*score_std

There is a great example here showing this