Hi Team :party-parrot:, How can I limit the loaded...
# questions
b
Hi Team 🦜, How can I limit the loaded config for native latest kedro in 2 dimensions, eg. `kedro run --pipeline dc_xyz --env dev`: 1. by env (
conf/dev/
) 2. by pipeline (
conf/base/data_connectors/xyz/
) Is there a simple way to achieve this double filter without much hacking?
d
do the two approaches not work in combination?
b
I think I’ve just never done either, if I understand correctly, the default is that we pick up every config object in conf if we don’t specify an environment, right? But that will also mean scanning through all the other entries and pipelines we don’t want to know about. We are building a setup where we only need two things, the conf for that particular pipeline, and the conf for that particular env
So somehow I need to tell kedro that my pipeline name means
conf/base/data_connectors/xyz
and apply the two filters with an
or
, not an
and
d
so it will do an AND and it’s not easy to configure it differently
that being said you can use the terminal here to be your frield
also running with no
--env
is the same as
kedro run --env base
so you can do
kedro run --pipeline a & kedro run --env b
&
will do an or in parallel,
&&
will do an or one after the other and fail if the first fails
b
but that still means one initiation, one spark session, etc?
because if there are no performance drawbacks, we can just alias this or use as is and we are golden
d
I think it’s two parallel initiations, performance is cluster specific
n
Slightly confused here, is the question about running two separate pipelines? Or is it because Kedro loading config that aren’t necessary in this case?
b
@Nok Lam Chan definitely don’t want to run two separate pipelines. We have dev and prod environments, and we want to avoid duplicating catalogs or params if possible, and we want to avoid loading configs we don’t need. So if we have pipelines A, B and C, and environment X and Y, we want to have • catalogs and params that are pipeline-dependent but not env dependent (which for now we put into
base/
), and • params that are env dependent (which we for now put into
dev/
and
prod/
) And if we are running pipeline A in environment X, we should process the catalog/param related to A, and the params related to X, but no B, C or Y. Does that make sense?
👍 1
Basically if I’m running a pipeline in dev, just want to see dev params and the catalog entries/params needed for that one pipeline