I am trying out the aws data wrangler example Hotel Booking Demand Example locally, so I can save the cost of data wrangler
Here is what I have done so far. I created flow as in the above example in my account. Then exported it with the option "Export to Inference Pipeline". I downloaded the Python Notebook in Visual Studio Code environment and configured it to execute in LocalPipeline Mode (sess = LocalPipelineSession()). All steps are fine but the "DataWranglerProcessingStep" fails with the following error.
Popping out 'ProcessingJobName' from the pipeline definition by default since it will be overridden at pipeline execution time. Please utilize the PipelineDefinitionConfig to persist this field in the pipeline definition if desired. Popping out 'TrainingJobName' from the pipeline definition by default since it will be overridden at pipeline execution time. Please utilize the PipelineDefinitionConfig to persist this field in the pipeline definition if desired. Popping out 'ProcessingJobName' from the pipeline definition by default since it will be overridden at pipeline execution time. Please utilize the PipelineDefinitionConfig to persist this field in the pipeline definition if desired. Popping out 'TrainingJobName' from the pipeline definition by default since it will be overridden at pipeline execution time. Please utilize the PipelineDefinitionConfig to persist this field in the pipeline definition if desired. Popping out 'ProcessingJobName' from the pipeline definition by default since it will be overridden at pipeline execution time. Please utilize the PipelineDefinitionConfig to persist this field in the pipeline definition if desired. Network configuration is not supported in local mode. Starting execution for pipeline pipeline-flow-06-16-52-58-a6dd87c2. Execution ID is 909630e1-331b-4ee7-a74f-f49779747ab2 Starting pipeline step: 'DataWranglerProcessingStep' Using the short-lived AWS credentials found in session. They might expire while running. Pipeline step 'DataWranglerProcessingStep' FAILED. Failure message is: KeyError: 'entrypoint' Pipeline execution 909630e1-331b-4ee7-a74f-f49779747ab2 FAILED because step 'DataWranglerProcessingStep' failed.
Has anyone tried to run data wrangler flow in local mode?
The flow was validated and tried to execute it locally on my machine. First, it needs to download the container image locally, which does not happen ("663277389841.dkr.ecr.us-east-1.amazonaws.com/sagemaker-data-wrangler-container:3.x") and then execute the whole pipeline successfully locally.
Thank you.