Prefect: How to Make a Worker Activate a Flow Specific venv Before Flow Run?

63 Views Asked by At

Preface: this GitHub issue is semi-related

I’m wondering what the correct method would be to associate a flow with a venv so that all flow runs are ran in the venv. I’m using a prefect worker and work pool to execute and hold flows, respectively.

My impression from reading through some GitHub issues is that it’s better to create a venv ahead of time with all flow dependancies installed, then run prefect worker start from inside the venv. This approach is unfavorable because:

  • it requires that flows are organized into work pools based on having compatible library dependancies.
  • Or, it would require that work pools and workers have a one to one relationship with flows, to avoid needing to match flows based on dependencies.
  • And in either case, workers must be built with knowledge of these dependencies ahead of time.

If this isn’t the best architecture please correct my approach, but I’m looking for a way to have flows share a one to one relationship with virtual environments. This could be accomplished by having a worker git pull the latest flow code, execute python3 -m venv venv && source venv/bin/activate, and finally pip install -r requirements.txt before running the flow. Even better would be a way to cache the requirements.txt file and only re-create the venv if the requirements.txt file changed.

I’ve looked into using the work pool working directory and command configurations. Unfortunately, there doesn’t seem to be a manner of getting the worker inside the flow’s project directory so that requirements.txt can be referenced by the command prior to running the flow.

Im at this point stuck. What’s the optimal way of handling dependancies for flows with prefect? Must workers be pre-built with knowledge of all their assigned work pool’s flow’s dependancies, and flows organized into work pools based on having compatible dependancies?

For reference, I’m booting the worker as a docker container. Flow code is hosted remotely in a git repository, and the worker can git clone the repo prior to every run as it stands.

0

There are 0 best solutions below