How is the path to a (.csv) file to be expressed in a MLTable file that is created in a local folder but then uploaded as part of a pipline job?
I'm following the Jupyter notebook automl-forecasting-task-energy-demand-advance from the azuerml-examples repo (article and notebook). This example has a MLTable file as below referencing a .csv file with a relative path. Then in the pipeline the MLTable is uploaded to be accessible to a remote compute (a few things are omitted for brevity)
my_training_data_input = Input(
type=AssetTypes.MLTABLE, path="./data/training-mltable-folder"
)
compute = AmlCompute(
name=compute_name, size="STANDARD_D2_V2", min_instances=0, max_instances=4
)
forecasting_job = automl.forecasting(
compute=compute_name, # name of the compute target we created above
# name="dpv2-forecasting-job-02",
experiment_name=exp_name,
training_data=my_training_data_input,
# validation_data = my_validation_data_input,
target_column_name="demand",
primary_metric="NormalizedRootMeanSquaredError",
n_cross_validations="auto",
enable_model_explainability=True,
tags={"my_custom_tag": "My custom value"},
)
returned_job = ml_client.jobs.create_or_update(
forecasting_job
)
ml_client.jobs.stream(returned_job.name)
But running this gives the error
Error meassage: Encountered user error while fetching data from Dataset. Error: UserErrorException: Message: MLTable yaml schema is invalid: Error Code: Validation Validation Error Code: Invalid MLTable Validation Target: MLTableToDataflow Error Message: Failed to convert a MLTable to dataflow uri path is not a valid datastore uri path | session_id=857bd9a1-097b-4df6-aa1c-8871f89580d8 InnerException None ErrorResponse { "error": { "code": "UserError", "message": "MLTable yaml schema is invalid: \nError Code: Validation\nValidation Error Code: Invalid MLTable\nValidation Target: MLTableToDataflow\nError Message: Failed to convert a MLTable to dataflow\nuri path is not a valid datastore uri path\n| session_id=857bd9a1-097b-4df6-aa1c-8871f89580d8" } }
paths:
- file: ./nyc_energy_training_clean.csv
transformations:
- read_delimited:
delimiter: ','
encoding: 'ascii'
- convert_column_types:
- columns: demand
column_type: float
- columns: precip
column_type: float
- columns: temp
column_type: float
How am I supposed to run this? Thanks in advance!
For Remote PATH you can use the below and here is the document for create data assets.
It's important to note that the path specified in the MLTable file must be a valid path in the cloud, not just a valid path on your local machine.