Hi Team party parrot How can I limit the loaded config for n Kedro #questions

Hi Team :party-parrot:, How can I limit the loaded...

Balazs Konig

11/29/2022, 3:09 PM

Hi Team 🦜, How can I limit the loaded config for native latest kedro in 2 dimensions, eg. `kedro run --pipeline dc_xyz --env dev`: 1. by env (

conf/dev/

) 2. by pipeline (

conf/base/data_connectors/xyz/

) Is there a simple way to achieve this double filter without much hacking?

datajoely

11/29/2022, 3:09 PM

do the two approaches not work in combination?

Balazs Konig

11/29/2022, 3:12 PM

I think I’ve just never done either, if I understand correctly, the default is that we pick up every config object in conf if we don’t specify an environment, right? But that will also mean scanning through all the other entries and pipelines we don’t want to know about. We are building a setup where we only need two things, the conf for that particular pipeline, and the conf for that particular env

Balazs Konig

11/29/2022, 3:13 PM

So somehow I need to tell kedro that my pipeline name means

conf/base/data_connectors/xyz

Balazs Konig

11/29/2022, 3:14 PM

and apply the two filters with an

or

, not an

and

datajoely

11/29/2022, 3:14 PM

so it will do an AND and it’s not easy to configure it differently

datajoely

11/29/2022, 3:14 PM

that being said you can use the terminal here to be your frield

datajoely

11/29/2022, 3:14 PM

also running with no

--env

is the same as

kedro run --env base

datajoely

11/29/2022, 3:15 PM

so you can do

kedro run --pipeline a & kedro run --env b

datajoely

11/29/2022, 3:15 PM

will do an or in parallel,

&&

will do an or one after the other and fail if the first fails

Balazs Konig

11/29/2022, 3:16 PM

but that still means one initiation, one spark session, etc?

Balazs Konig

11/29/2022, 3:16 PM

because if there are no performance drawbacks, we can just alias this or use as is and we are golden

datajoely

11/29/2022, 3:17 PM

I think it’s two parallel initiations, performance is cluster specific

Nok Lam Chan

11/29/2022, 4:00 PM

Slightly confused here, is the question about running two separate pipelines? Or is it because Kedro loading config that aren’t necessary in this case?

Balazs Konig

11/29/2022, 5:15 PM

@Nok Lam Chan definitely don’t want to run two separate pipelines. We have dev and prod environments, and we want to avoid duplicating catalogs or params if possible, and we want to avoid loading configs we don’t need. So if we have pipelines A, B and C, and environment X and Y, we want to have • catalogs and params that are pipeline-dependent but not env dependent (which for now we put into

base/

), and • params that are env dependent (which we for now put into

dev/

and

prod/

) And if we are running pipeline A in environment X, we should process the catalog/param related to A, and the params related to X, but no B, C or Y. Does that make sense?

👍 1

Balazs Konig

11/29/2022, 5:23 PM

~~Basically if I’m running a pipeline in dev, just want to see dev params and the catalog entries/params needed for that one pipeline~~

2 Views

Open in Slack

Previous Next