so my situation, I have been trying to distribute the load of python
based some data science pipelines and, after much searching and some QA (Scale out jobs with high memory consumption but low computing power within AWS and using Docker: finding best solution) I have come to the conclusion that Nuclio might be a good fit, most likely built on top of kubernetes
. still a major question remains:
say I want to do this:
@run_with_nuclio
def step_1(context):
# in the docker image
import pandas as pd
# in my project, using submodules
from my_big_submodule_1 import do_this
from my_big_submodule_2 import do_that
do_this()
do_that()
I had major "context" problems in the past so right now my super-bright, super-safe solution is to package (literally getting all py files in the project, zip them) pass them to the function to be executed (in the remote environment), unzip them there and then run the function.
This is "good" because it provides me with immense flexibility. But this spaghetti solution is not the way to go.
Is there a way to do this by leveraging Nuclio framework? (functions seem to have all the info within, never calling external packages not present in the "classic" packages)
If I understand your problem correctly, you're looking for a way to utilize external Python modules from your project within your Nuclio function?
If so, the solution is less Nuclio-centric, but rather Docker-centric. You need to make sure that your Python modules are available within the Docker environment. There are two ways to do this:
Bake your code into your Docker image (easiest and recommended). From there, you should be able to use it within your Nuclio function. There's a page in the Nuclio docs on deploying a function from a Dockerfile: https://nuclio.io/docs/latest/tasks/deploy-functions-from-dockerfile/. This approach is simpler as everything is in your Docker image. However if your code changes, you will need to rebuild your image (can be automated with CI/CD pipeline).
Mount a directory with your Python modules to your container with a K8s PVC (or Docker volume), add to Python path, and use as expected. This approach is more complex and depends on your K8s environment. However, since the code is just being mounted within the container, you do not need to re-build your image if your code changes.