my situation is as follows, we have:
- an "image" with all our dependencies in terms of software + our in-house
python
package - a "pod" in which such image is loaded on command (kubernetes pod)
- a
python
project which has some uncommitted code of its own which leverages the in-house package - Also please assume you cannot work on the machine (or cluster) directly (say remote
SSH
interpreter). the cluster is multi-tenant and we want to optimize it as much as possible so no idle times between trials - also for now forget security issues, everything is locked down on our side so no issues there
we want to essentially to "distribute" remotely the workload -i.e. a script.py
- (unfeasible in our local machines) without being constrained by git commits, therefore being able to do it "on the fly". this is necessary as all of the changes are experimental in nature (think ETL/pipeline kind of analysis): we want to be able to experiment at scale but with no bounds with regards to git.
I tried dill
but I could not manage to make it work (probably due to the structure of the code). ideally, I would like to replicate the concept mleap
applied to ML pipelines for Spark
but on a much smaller scale, basically packaging but with little to no constraints.
What would be the preferred route for this use case?