How to format the file path in an MLTable for Azure Machine Learning uploaded during a pipeline job?

728 Views Asked by At

How is the path to a (.csv) file to be expressed in a MLTable file that is created in a local folder but then uploaded as part of a pipline job?

I'm following the Jupyter notebook automl-forecasting-task-energy-demand-advance from the azuerml-examples repo (article and notebook). This example has a MLTable file as below referencing a .csv file with a relative path. Then in the pipeline the MLTable is uploaded to be accessible to a remote compute (a few things are omitted for brevity)

my_training_data_input = Input(
    type=AssetTypes.MLTABLE, path="./data/training-mltable-folder"
)

compute = AmlCompute(
        name=compute_name, size="STANDARD_D2_V2", min_instances=0, max_instances=4
    )

forecasting_job = automl.forecasting(
    compute=compute_name, # name of the compute target we created above
    # name="dpv2-forecasting-job-02",
    experiment_name=exp_name,
    training_data=my_training_data_input,
    # validation_data = my_validation_data_input,
    target_column_name="demand",
    primary_metric="NormalizedRootMeanSquaredError",
    n_cross_validations="auto",
    enable_model_explainability=True,
    tags={"my_custom_tag": "My custom value"},
)

returned_job = ml_client.jobs.create_or_update(
    forecasting_job
)

ml_client.jobs.stream(returned_job.name)

But running this gives the error

Error meassage: Encountered user error while fetching data from Dataset. Error: UserErrorException: Message: MLTable yaml schema is invalid: Error Code: Validation Validation Error Code: Invalid MLTable Validation Target: MLTableToDataflow Error Message: Failed to convert a MLTable to dataflow uri path is not a valid datastore uri path | session_id=857bd9a1-097b-4df6-aa1c-8871f89580d8 InnerException None ErrorResponse { "error": { "code": "UserError", "message": "MLTable yaml schema is invalid: \nError Code: Validation\nValidation Error Code: Invalid MLTable\nValidation Target: MLTableToDataflow\nError Message: Failed to convert a MLTable to dataflow\nuri path is not a valid datastore uri path\n| session_id=857bd9a1-097b-4df6-aa1c-8871f89580d8" } }

paths:
  - file: ./nyc_energy_training_clean.csv
transformations:
  - read_delimited:
        delimiter: ','
        encoding: 'ascii'
  - convert_column_types:
      - columns: demand
        column_type: float
      - columns: precip
        column_type: float
      - columns: temp
        column_type: float

How am I supposed to run this? Thanks in advance!

1

There are 1 best solutions below

1
On

For Remote PATH you can use the below and here is the document for create data assets.

It's important to note that the path specified in the MLTable file must be a valid path in the cloud, not just a valid path on your local machine.

enter image description here