ModelAssetPathNotFoundInStorage error when mlflow.sklearn.autolog() used to train within an Azure ML YAML Pipeline

482 Views Asked by At
  • YAML appears correct, there are no validation issues, and the pipeline can be seen in the Azure ML Studio GUI
  • I'm assuming the error is thrown by mlflow.sklearn.autolog() when the fit() method is called
  • Full stacktrace not available, the exception shown below is the full exception raised in Azue ML Studio GUI
  • I've commented out the function save_outputs() and the same error is raised, leading to my assumption regarding MLFlow SDK attempting to autolog the model
  • I haven't included the code below for the predict job in the pipeline, this step doesn't get to execute as the seep job in the pipeline fails first

Exception raised in Azure ML GUI

UserErrorException:
    Message: Model asset creation API failed with {'additional_properties': {'message': 'The request is invalid.', 'details': [{'code': 'ModelAssetPathNotFoundInStorage', 'message': 'No blobs found in storage at model asset path: azureml/HD_9b8798ab-c0cb-4c5d-8822-1411c01af249_0/model/'}], 'code': 'BadRequest', 'statusCode': 400}, 'error': <data_capability._restclient.model.models._models_py3.RootError object at 0x7f210407e310>, 'correlation': {'operation': '214b5eca2adce7f52fdba06fb5003437', 'request': '9515d04215048b81', 'RequestId': '9515d04215048b81'}, 'environment': '<REDACTED>', 'location': '<REDACTED>', 'time': datetime.datetime(2023, 1, 12, 23, 1, 27, 38782, tzinfo=<FixedOffset '+00:00'>), 'component_name': 'modelregistry'}
    InnerException None
    ErrorResponse 
{
    "error": {
        "code": "UserError",
        "message": "Model asset creation API failed with {'additional_properties': {'message': 'The request is invalid.', 'details': [{'code': 'ModelAssetPathNotFoundInStorage', 'message': 'No blobs found in storage at model asset path: azureml/HD_9b8798ab-c0cb-4c5d-8822-1411c01af249_0/model/'}], 'code': 'BadRequest', 'statusCode': 400}, 'error': <data_capability._restclient.model.models._models_py3.RootError object at 0x7f210407e310>, 'correlation': {'operation': '214b5eca2adce7f52fdba06fb5003437', 'request': '9515d04215048b81', 'RequestId': '9515d04215048b81'}, 'environment': '<REDACTED>', 'location': '<REDACTED>', 'time': datetime.datetime(2023, 1, 12, 23, 1, 27, 38782, tzinfo=<FixedOffset '+00:00'>), 'component_name': 'modelregistry'}"
    }
} Marking the experiment as failed because initial child jobs have failed due to user error

CLI Command

$ az ml job create --subscription <REDACTED> --resource-group <REDACTED> --workspace-name <REDACTED> --file /home/azureuser/cloudfiles/code/Users/<REDACTED>/repos/<REDACTED>/src/assets/pipeline_tune.yml --stream

RunId: quirky_bone_tf250gdlfg
Web View: https://ml.azure.com/runs/<REDACTED>?wsid=/subscriptions/<REDACTED>/resourcegroups/<REDACTED>/workspaces/<REDACTED>

Streaming logs/azureml/executionlogs.txt
========================================

[2023-01-12 22:58:42Z] Submitting 1 runs, first five are: <REDACTED>
[2023-01-12 23:03:46Z] Execution of experiment failed, update experiment status and cancel running nodes.

Execution Summary
=================
RunId: <REDACTED>
Web View: https://ml.azure.com/runs/<REDACTED>?wsid=/subscriptions/<REDACTED>/resourcegroups/<REDACTED>/workspaces/<REDACTED>
Exception : 
 {
    "error": {
        "code": "UserError",
        "message": "Pipeline has some failed steps. See child run or execution logs for more details.",
        "message_format": "Pipeline has some failed steps. {0}",
        "message_parameters": {},
        "reference_code": "PipelineHasStepJobFailed",
        "details": []
    },
    "environment": "<REDACTED>",
    "location": "<REDACTED>",
    "time": "2023-01-12T23:03:45.982134Z",
    "component_name": ""
} 

Folder structure

src/
  assets/
    component_train.yml
    pipeline_tune.yml
  train.py

src/assets/pipeline_tune.yml

# References
# ----------
# - How to create component pipelines
#   - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-create-component-pipelines-cli
# - Pattern reference
#   - https://github.com/Azure/azureml-examples/tree/main/cli/jobs/pipelines-with-components/pipeline_with_hyperparameter_sweep
# - Pipeline schema
#   - https://learn.microsoft.com/en-us/azure/machine-learning/reference-yaml-job-pipeline
# - Sweep Job schema (hyperparameter tuning)
#   - https://learn.microsoft.com/en-us/azure/machine-learning/reference-yaml-job-sweep
# - Core Azure ML YAML syntax
#   - https://learn.microsoft.com/en-us/azure/machine-learning/reference-yaml-core-syntax#binding-inputs-and-outputs-between-steps-in-a-pipeline-job
$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline

# -------------------------------------------------------------------
# Pipeline settings
# - Having inputs defined at the pipelione level, instead of the first
#  job, allows for parameterisation of the pipeline via both CLI/SDK
experiment_name: <REDACTED>
description: Tune hyperparemeters for training a scikit-learn SVM on the Iris dataset.
settings:
    default_compute: azureml:aml-compute-cpu
    default_datastore: azureml:workspaceblobstore
inputs:
  data: 
    type: uri_file
    mode: ro_mount
    path: wasbs://[email protected]/iris.csv
outputs:
  predict:
    type: uri_folder
    mode: rw_mount
    path: azureml://datastores/workspaceblobstore/paths/<REDACTED>

# -------------------------------------------------------------------
# Jobs
jobs:
  
  # Tune job
  tune:
    type: sweep
    inputs:
      data: ${{parent.inputs.data}}
    outputs:
      model:
        type: mlflow_model
      test_data:
        type: uri_folder

    trial: ./component_train.yml

    search_space:
      c_value:
        type: uniform
        min_value: 0.5
        max_value: 0.9
      kernel:
        type: choice
        values:
          - rbf
          - linear
          - poly
      coef0:
        type: uniform
        min_value: 0.1
        max_value: 1

    sampling_algorithm: random

    objective:
      goal: minimize
      primary_metric: training_f1_score

    limits:
      max_total_trials: 20
      max_concurrent_trials: 10
      timeout: 7200

  # Score test data
  predict:
    type: command
    inputs:
      model: ${{parent.jobs.tune.outputs.model}}
      test_data: ${{parent.jobs.tune.outputs.test_data}}
    outputs:
      predictions: ${{parent.outputs.predict}}
    component: ./component_predict.yml

src/assets/component_train.yml

# References
# ----------
# - How to create component pipelines
#   - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-create-component-pipelines-cli
# - Command schema
#   - https://learn.microsoft.com/en-us/azure/machine-learning/reference-yaml-job-command
# - Core Azure ML YAML syntax
#   - https://learn.microsoft.com/en-us/azure/machine-learning/reference-yaml-core-syntax#binding-inputs-and-outputs-between-steps-in-a-pipeline-job
$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
type: command

# -------------------------------------------------------------------
# Command settings
version: 1
description: Training a scikit-learn SVM on the Iris dataset
environment: azureml:<REDACTED>:1
inputs: 
  data:
    type: uri_file
  c_value:
    type: number
    default: 1.0
  kernel:
    type: string
    default: rbf
  coef0: 
    type: number
    default: 0
outputs:
  model:
    type: mlflow_model
  test_data:
    type: uri_folder


# -------------------------------------------------------------------
# Job
code: ..
command: >-
  python train.py
  --data ${{inputs.data}}
  --C ${{inputs.c_value}}
  --kernel ${{inputs.kernel}}
  --coef0 ${{inputs.coef0}}
  --outputs_model ${{outputs.model}}
  --outputs_test_data ${{outputs.test_data}}

src/train.py

"""
Notes
-----
- Imports in this file must match the imports in `score.py` to allow
  pickle objects to be loaded correctly by `score.py`

References
----------
- Azure ML Environments and ScriptRunConfig for training
    - i.e How to execute this script against Azure ML compute cluster
    - https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-environments#use-environments-for-training
- Azure ML - Hyperparameter tuning
    - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters
- Azure ML - Hyperparameter tuning in Azure Machine Learning pipeline
    - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-sweep-in-pipeline#how-to-do-hyperparameter-tuning-in-azure-machine-learning-pipeline
- Azure ML - Random, Grid, Bayesian sampling
    - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters#sampling-the-hyperparameter-space
- Azure ML - How to Train scikit-learn
    - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-train-scikit-learn#prepare-the-training-script
- Azure ML - How to train Tesnorflow
    - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-train-tensorflow
- Azure ML - How to train Keras
    - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-train-keras
- Azure ML - How to train PyTorch
    - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-train-pytorch
- Azure ML - Logging with MLFlow
    - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-log-view-metrics?tabs=jobs#getting-started
- MLFlow - Autologging of frameworks
    - https://mlflow.org/docs/latest/tracking.html#automatic-logging
"""
import argparse

from distutils.dir_util import copy_tree
from pathlib import Path

import mlflow.sklearn
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC


def get_data(data):
    """Get data and return train/test splits"""
    df = pd.read_csv(data)
    X = df.iloc[:, :-1]
    y = df.iloc[:, -1]
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=0
    )

    # Return split data
    return X_train, X_test, y_train, y_test

def get_hyperparameters(**kwargs):
    """Set hyperparameters here
    
    References
    ----------
    - Understand how to pass hyperparameters to a sklearn pipeline
        - https://stackoverflow.com/questions/66388056/why-does-sklearn-pipeline-set-params-not-work
    """
    hyperparameters = {
        "estimator__C": kwargs.get("c_value"),
        "estimator__kernel": kwargs.get("kernel"),
        "estimator__coef0": kwargs.get("coef0"),
    }
    return hyperparameters

def save_outputs(model, model_dir, X_test, y_test, test_data_dir):
    """Save outputs of the training process"""
    # Save model
    local_dir = "model"
    mlflow.sklearn.save_model(model, local_dir)
    copy_tree(local_dir, model_dir)

    # Save test data
    X_test.to_csv(Path(test_data_dir) / "X_test.csv", index=False)
    y_test.to_csv(Path(test_data_dir) / "y_test.csv", index=False)

def train_model(hyperparameters, X_train, y_train):
    """Train the model with your chosen framework here"""
    # Model architecture
    model = Pipeline(
        steps=[
            ("scaler", StandardScaler()),
            ("estimator", SVC()),
        ]
    )

    # Set hyperparameters for training run
    model.set_params(**hyperparameters)

    # Train model
    model.fit(X_train, y_train)

    return model

def parse_args():
    """Parse args and hyperparameters"""
    parser = argparse.ArgumentParser()

    # Parse mandatory args
    parser.add_argument("data", help="Path to data for training", type=str)
    parser.add_argument("outputs_model", help="Name of the model", type=str)
    parser.add_argument("outputs_test_data", help="Path to data for testing", type=str)

    # Parse hyperparameter args
    parser.add_argument("c_value", help="Coeffiecient for the estimator", type=str)
    parser.add_argument("kernel", help="Kernel for the estimator", type=str)
    parser.add_argument("coef0", help="Coeffiecient for the estimator", type=str)

    # Get args
    args = parser.parse_args()

    return args


def main(**kwargs):
    """Train the model(s)

    Parameters
    ----------
    kwargs : dict
        Dictionary of all parsed arguments
    """
    # Logging
    # - Autologging works with (0.22.1 <= scikit-learn <= 1.1.3)
    # - See: https://www.mlflow.org/docs/latest/python_api/mlflow.sklearn.html#mlflow.sklearn.autolog
    mlflow.sklearn.autolog()

    # Setup
    X_train, X_test, y_train, y_test = get_data(kwargs.get("data"))
    hyperparameters = get_hyperparameters(**kwargs)

    # Train
    model = train_model(hyperparameters, X_train, y_train)

    # Output
    save_outputs(model, kwargs.get("outputs_model"), X_test, y_test, kwargs.get("outputs_test_data"))
    

if __name__ == "__main__":
    """Entrypoint for training the model(s)"""
    main(**vars(parse_args()))

0

There are 0 best solutions below