Marc Gris
07/10/2023, 4:21 PM.venv_model_*a*/bin/python -m kedro run --tags=model_*a*
then
.venv_model_*b*/bin/python -m kedro run --tags=model_*b*
etc..
But, this, IMHO, is really far from an optimal “dev-confort-centric” workflow…
Hence my initial request / question:
Would there be some mechanisms that could allow passing a path to a venv when creating a node / pipeline ?
(I must confess that, in my naïveté, I though that this would be “quite easy” using a before_node_run
callback… But, I quickly had to reckon that my skills were too meager for the task 😅 )
Many thanks in advance for taking the time to consider this suggestion / request.
Regards
MarcDeepyaman Datta
07/10/2023, 4:38 PMsrc/requirements.txt
, pin them, and struggle when they can't resolve the dependencies for a massive pipeline in a single environment. The impression I get is, 80-90% of DS users also don't want to define requirements at the modular pipeline level, or manage multiple environments. So far, discussions I've seen for micropackaging, etc. have talked about trying to extract the necessary requirements from the central requirements file, in order to avoid this.
So, I think by wanting to run in isolated environments like this, you're technically doing the right thing, but it's likely not something Kedro itself caters to at this time (i.e. you're a more advanced user).
So, what's the right way to do this? I guess a Kedro plugin or, as you've mentioned, orchestrator that can already handle spinning up environments locally, be it Airflow, Prefect, etc., is the right thing to do. I think this could also be informed by the "right"/standardized way to deploy Kedro to orchestrators, and making a plugin/runner to do more-or-less the same thing locally.
But be curious to see what @Juan Luis @Nok Lam Chan @marrrcin thinks.Matthias Roels
07/10/2023, 6:17 PMIñigo Hidalgo
07/11/2023, 2:13 AMJuan Luis
07/11/2023, 8:47 AMmarrrcin
07/11/2023, 10:01 AMif dependencies are so unresolvable that you need to have separate environments to live in, these pipelines really should be thought of as totally independent entitiesand also agree with @Matthias Roels that it’s a similar case to running something from monorepo. Imho, Kedro is not a tool to handle (or workaround) the limitations of Python in this area. At this point of complexity, you should probably have an orchestrator on top, that will connect multiple Kedro pipelines (=separate projects with isolated requirements) on a “business logic” level. I would definitely go with containerization + orchestration with sth like Airflow / Argo or even Kubeflow.
Marc Gris
07/11/2023, 9:20 PM