GCloud Ml-engine: training output directory

388 Views Asked by At

I have have trained from local use command like this:

gcloud ml-engine jobs submit training task43 --module-name=train.train --config=config.yaml --job-dir=gs://root-album-8512 --package-path=train --region=asia-east1 --staging-bucket=gs://root-album-8512

After training, where can I find my output training directory? In my log job written "Saved model checkpoint to /user_dir/runs/1503579423/checkpoints/model-227400"

But I don't know where is it. When I check my Storage or Bucket there is no such directory.

1

There are 1 best solutions below

0
On

The value of --job-dir gets passed as the value of the command line argument "job-dir" to your code. So your code needs to get the value from the command line argument and use that value with the model saver to save your checkpoints to that location.

If you don't set a location for the model saver, then it will end up saving to a local directory and not your GCS bucket and won't be available after the job ends.