I have packaged my training code as a python package and then am able to run it as a custom training job on Vertex AI. Now, I wanted to be able to schedule this job to run, say every 2 weeks, and re-train the model. The Scheduling settings in the CustomJoBSpec allow only 2 fields, "timeout" and "restartJobOnWorkerRestart" so it's not possible using the scheduling settings in the CustomJobSpec. One way to achieve this I could think of was to create a Vertex AI pipeline with a single step using the "CustomPythonPackageTrainingJobRunOp" Google Cloud Pipeline Component and then scheduling the pipeline to run as I see fit. Are there better alternatives to achieve this?
Edit:
I was able to schedule the custom training job using Cloud Scheduler, but I found using the create_schedule_from_job_spec method in the AIPlatformClient very easy to use in the Vertex AI pipeline. The steps I took to schedule the custom job using Cloud Scheduler in gcp are as follows, link to google docs:
- Set target type to HTTP
- For the url to specify the custom job, I followed this link to get the url
- For the authentication, under Auth header, I selected the "Add OAauth token"
You also need to have a "Cloud Scheduler service account" with a "Cloud Scheduler Service Agent role granted to it" in your project. Although the docs ay this should have been set up automatically if you enabled the Cloud Scheduler API after March 19, 2019, this was not the case for me and had to add the service account with the role manually.
As per your requirement, the various possible ways for scheduling :
1. Cloud Composer
Cloud Composer is a managed Apache Airflow that helps you create, schedule, monitor and manage workflows.
You can follow the below mentioned steps to schedule your job every two weeks using Composer :
Alternately, you can also use the unix-cron string format (* * * * *) to do the scheduling.
I.e. In your case for scheduling every two weeks the cron format will be like :
* * 1,15 * *
You can pass the parameters required by the custom job inside the PythonOperator using op_args and op_kwargs arguments.
After the DAG file is written, you need to push it to the dags/ folder inside the Composer Environment Bucket.
You can check the status of the scheduled DAG in the Airflow UI.
The scheduled DAG file would look like this:
sample_dag.py :
2. Cloud Scheduler: To schedule a job using Cloud Scheduler you will need to do the following configuration:
3. Scheduling a recurring pipeline run using the Kubeflow Pipelines SDK:
You can schedule a recurring pipeline run using Python and the Kubeflow Pipelines SDK.