AWS Sagemaker DeepAR Validation Error Additional Properties not allowed ('training' was unexpected)

382 Views Asked by At

I don't know what the issue is. Here is the code:

estimator = sagemaker.estimator.Estimator(
    image_uri=image_name,
    sagemaker_session=sagemaker_session,
    role=role,
    train_instance_count=1,
    train_instance_type="ml.m5.large",
    base_job_name="deepar-stock",
    output_path=s3_output_path,
)

hyperparameters = {
    "time_freq": "24H",
    "epochs": "100",
    "early_stopping_patience": "10",
    "mini_batch_size": "64",
    "learning_rate": "5E-4",
    "context_length": str(context_length),
    "prediction_length": str(prediction_length),
    "likelihood": "gaussian",
}

estimator.set_hyperparameters(**hyperparameters)

%%time

estimator.fit(inputs=f"{s3_data_path}/train/")

And when I try to train the model I get the following error (in its entirety).

------------------------------------------------------------------------

---
UnexpectedStatusException                 Traceback (most recent call last)
<timed eval> in <module>

/opt/conda/lib/python3.7/site-packages/sagemaker/estimator.py in fit(self, inputs, wait, logs, job_name, experiment_config)
    681         self.jobs.append(self.latest_training_job)
    682         if wait:
--> 683             self.latest_training_job.wait(logs=logs)
    684 
    685     def _compilation_job_name(self):

/opt/conda/lib/python3.7/site-packages/sagemaker/estimator.py in wait(self, logs)
   1626         # If logs are requested, call logs_for_jobs.
   1627         if logs != "None":
-> 1628             self.sagemaker_session.logs_for_job(self.job_name, wait=True, log_type=logs)
   1629         else:
   1630             self.sagemaker_session.wait_for_job(self.job_name)

/opt/conda/lib/python3.7/site-packages/sagemaker/session.py in logs_for_job(self, job_name, wait, poll, log_type)
   3658 
   3659         if wait:
-> 3660             self._check_job_status(job_name, description, "TrainingJobStatus")
   3661             if dot:
   3662                 print()

/opt/conda/lib/python3.7/site-packages/sagemaker/session.py in _check_job_status(self, job, desc, status_key_name)
   3218                 ),
   3219                 allowed_statuses=["Completed", "Stopped"],
-> 3220                 actual_status=status,
   3221             )
   3222 
UnexpectedStatusException: Error for Training job deepar-2021-07-31-22-25-54-110: Failed. Reason: ClientError: Unable to initialize the algorithm. Failed to validate input data configuration. (caused by ValidationError)

Caused by: Additional properties are not allowed ('training' was unexpected)

Failed validating 'additionalProperties' in schema:
    {'$schema': 'http://json-schema.org/draft-04/schema#',
     'additionalProperties': False,
     'anyOf': [{'required': ['train']}, {'required': ['state']}],
     'definitions': {'data_channel': {'properties': {'ContentType': {'enum': ['json',
                                                                              'json.gz',
                                                                              'parquet',
                                                                              'auto'],
                                                                     'type': 'string'},
                                                     'RecordWrapperType': {'enum': ['None'],

On instance:
    {'training': {'RecordWrapperType': 'None',
                  'S3DistributionType': 'FullyReplicated',
                  'TrainingInputMode': 'File'}}

Here it says 'training' was unexpected. I don't know why it says 'training' on that last line On instance:. I don't know how to solve this. I've looked at other pages for help but I can't find a straight answer. I know that my data is structured right. The errors seem to be with the hyperparameters but I don't know that for sure. Please help!

2

There are 2 best solutions below

0
On

I just needed to add this line of code and change the following code to look like this.

data_channels = {"train": f"{s3_data_path}/train/"}

estimator.fit(inputs=data_channels)
0
On

All AWS estimators require a dictionary for the data inputs. Simply putting a file path does not work. This is because all AWS estimators (built in and custom) use containers. Each time a model is used, it a new container is built for it. Each container has its own generic file directory system. The training data path inside each container is typically something like opt/ml/data/train. When building the container, it looks for the data to be in the form data = {'train': x, 'test': y}. You need to set these keys and values because the container looks for them and then builds a directory pulling and copying the data from data['train'] to the generic location inside the container associated with training data. Similarly, if you had setup DeepAR for testing, it would copy and save data from data['test'] to a generic location inside the container such as /opt/ml/data/test... a good way to learn this is building custom models using script mode which forces us to understand exactly how to access the default container directory and how to change it.