How to properly manage external dependencies in prefect flow?

1.5k Views Asked by At

I would like to implement one central prefect project, where over time it will be possible to add flows independent of each other. The structure of the project is something like this:

prefect/
├── src/
│   ├── flows/
│   │   ├── test_pack1/
│   │   │   ├── common/
│   │   │   │   ├── __init__.py
│   │   │   │   └── test_module.py
│   │   │   ├── .env
│   │   │   ├── __init__.py
│   │   │   ├── requirements.txt
│   │   │   └── test_pack1_flow.py
│   │   ├── test_pack2/
│   │   │   ├── __init__.py
│   │   │   ├── .env
│   │   │   ├── requirements.txt
│   │   │   └── test_pack2_flow.py
│   │   ├── __init__.py
│   │   └── Dockerfile
│   ├── utilities/
│   │   ├── __init__.py
│   │   ├── storage.py
│   │   ├── builder.py
│   │   ├── executor.py
│   │   └── run_config.py
│   ├── .env
│   ├── __init__.py
│   └── main.py
├── .gitignore
├── poetry.lock
└── pyproject.toml

I would like each flow in the flows/ folder to be independent of the central project and created as a separate docker container.

builder.py at startup searches for all flows in flows/ folder, sets a specific configuration and registers them on the server.

But I ran into the problem of importing third-party packages. Let's say in the test_package1/ in requirements.txt there is SQLAlchemy==1.4.34. And in test_pack1/common/test_module.py there is an import sqlalchemy. And test_pack1/test_pack1_flow.py have a @task with function from test_module.py. When the FlowBuilder class looks for a flow variable in the file test_pack1_flow.py it does this using the function flow = extract_flow_from_file(str(flow_module)). At this step, a ModuleNotFoundError error occurs, since there is no such dependency in the prefect central application(in pyproject.toml). But when the docker container is created, after flow.register(), of course it will already be there. How can I handle this step? Or maybe I'm doing something wrong?

I use Docker Storage, Docker Run and Local Executor.

1

There are 1 best solutions below

0
On

This is a matter of packaging flow code dependencies, and it's all definitely doable. Since this was cross-posted on Prefect Discourse here, I responded in much more detail there.

Here is a short summary:

  • You can use Prefect Register CLI instead of building custom builder.py functionality looping over flows
  • You can have a custom utility function setting different storage and run_config based on your environment (dev/stage/prod etc)
  • To solve the problem of dependencies being in a Docker image but not in your local environment, you can solve it with a custom package defined with setup.py