How to use SageMaker Experiments trackers and trial components?

804 Views Asked by At

I'm completely confused with how SageMaker Experiments works. I used the SDK to create an Experiment and a Trial. Now I want to track job parameters, metadata and metrics.

Shall I create Trial components manually with the SDK or let SM Estimator fit create them for me??

after creating my experiment and trial, I use the below code

job.fit(inputs,
        experiment_config={
            "ExperimentName": reg_experiment.experiment_name,
            "TrialName": trial1.trial_name,
            "TrialComponentDisplayName": "training-with-RF1"},
       wait=False)

When I look in Studio, I see an automatically created Trial component named "training-with-RF1".

I see here and here that we can (can = must? should? could?...) also create Trials manually, for example with

my_trial = trial.Trial.create('AutoML')
my_tracker = tracker.Tracker.create()
my_tracker.log_parameter('learning_rate', 0.01)
my_trial.add_trial_component(my_tracker)

Or here with

Trial.create(
        trial_name=trial_name,
        experiment_name=mnist_experiment.experiment_name,
        sagemaker_boto_client=sm)

When I create trials like that manually, they appear as separate empty trials than the trials created by SageMaker jobs, see below.

I'm confused because the AWS blog post says we have to create Trials manually, however SageMaker Training jobs seem to be creating those trials on our behalf...

I'm completely confused by this service, can someone please help?

1

There are 1 best solutions below

0
On

The best way to do this is to create an Experiment, a Trial and then pass the experiment config to the Training Job. The training job will automatically create a Trial Component and add it to the Trial.

Depending on the type of training job you are using, some metrics will automatically be tracked in the Trial Component. You can set this up through metric_definitions regex in the Estimator.

If you are running the training job in script mode, you can install sagemaker-experiments in the container running the job (or in the python script using subprocess.call) and import the Tracker object. You can use the Tracker to log metrics from the training script to the Trial Component.

There are some examples here - https://github.com/aws/amazon-sagemaker-examples/tree/main/sagemaker-experiments

This is the documentation for sagemaker-experiments sdk - https://sagemaker-experiments.readthedocs.io/en/latest/tracker.html