I don't know if this question has been covered earlier, but here it goes - I have a notebook that I can run manually using the 'Run' button in the notebook or as a job.
The runtime for running the notebook directly is roughly 2 hours. But when I execute it as a job, the runtime is huge (around 8 hours). The piece of code which takes the longest time is calling an applyInPandas function, which in turn calls a pandas_udf. The pandas_udf trains an auto_arima model.
Can anyone help me figure out what might be happening? I am clueless.
Thanks!
When running a notebook as a Job, you have to define a "job cluster" (in the contrast with an "interactive cluster" where you can attach to the notebook and hit run). There is a possible delay when the "job cluster" has to be spun up, but this usually only takes less than 10 minutes. Other than that, makes sure your job cluster's spec is the same as your interactive cluster (i.e. same worker's type, worker's size, autoscaling, etc).