Hi everyone `local catalog yml` does not override the `base Kedro #questions

Hi everyone, `local/catalog.yml` does not overrid...

Marc Gris

07/20/2023, 2:47 PM

Hi everyone,

local/catalog.yml

does not override the `base/catalog.yml`… Any idea what could cause this behavior ? Thx M.

Nok Lam Chan

07/20/2023, 2:53 PM

Which ConfigLoader you are using? And can you show the relevant config for both

yml

files? p.s. I know I still owe some answer to other questions you asked, will get back to it.

👍🏼 1

Marc Gris

07/20/2023, 2:54 PM

Marc Gris

07/20/2023, 2:56 PM

oh… I think I now see what’s wrong 😅 ! Let me check that…

Marc Gris

07/20/2023, 3:01 PM

Ok… It’s a bit stupid, but, putting my ego aside, I’ll share what was wrong, because I’d love to have you’re advice on how to best handle this

Marc Gris

07/20/2023, 3:01 PM

Marc Gris

07/20/2023, 3:01 PM

Marc Gris

07/20/2023, 3:03 PM

My assumption was that commenting-out a dataset in the local catalog would result in this dataset not being created… But of course, comments being not evaluated, the

{{ table }}.db_columns

base

does get created in the catalog…

Marc Gris

07/20/2023, 3:10 PM

(Nota bene: This dataset only concerns one of many other pipelines) Let’s assume that I’m working offline. Since commenting out in

local

does not impact

base

, is it possible to work on / run the other pipelines in my project : • without commenting things out in

base

• and without having to create a local db ? Thanks in advance for your advice / suggestions Regards M. P.S: of course, it is not out of lazyness that I’m reluctant to comment in

base

😅 … It’s just that “messing with `base`” to be able to work locally feels “off”.

Nok Lam Chan

07/20/2023, 3:15 PM

I see what you mean, but would your pipeline still works if you remove this datasets?

Marc Gris

07/20/2023, 3:16 PM

Yes, 3 modular-pipelines out 4 do not rely on this dataset.

Nok Lam Chan

07/20/2023, 3:17 PM

local

are mean to override, in this case when you commented it out it simply means nothing to override.

👍 2

Nok Lam Chan

07/20/2023, 3:17 PM

Does it affect you whether or not this dataset is inside your catalog? If you are not using it.

Marc Gris

07/20/2023, 3:22 PM

I’m not sure to understand your question. I’m personally not (yet) concerned with the pipeline that depends on that depends. I work on the pipelines “downstream” from it. But my colleagues are… hence its presence in `base`… 🙂 Don’t you think that this “edgy-case” might be pointing to the utility of having a mechanism that would allow from

local

to “ignore” things in

base

, i.e something like

conf/local/catalog.yml

Copy code

dataset:
    type: to_ignore

What do you think ? 🙂 Would that deserve a feature request ?

Nok Lam Chan

07/20/2023, 3:31 PM

and without having to create a local db

I guess I am not clear about this yet, why do you need to create a local db? From my understanding this dataset doesn’t concern your pipeline, so whether or not it exists (or not), shouldn’t affect your pipeline.

Marc Gris

07/20/2023, 3:33 PM

=> If I’m working offline and “do not mess around” with base… then when I run

kedro run -p my-pipeline-without-db

would raise an error since the creation of the catalog is “modular-pipeline-agnostic”

Marc Gris

07/20/2023, 3:34 PM

Even if I actually do not need the dataset for the pipeline I’m working on, kedro will still try to query the db…

👍🏼 1

Nok Lam Chan

07/20/2023, 3:39 PM

I think the more important question here is the connection is created when the Dataset is created, could we delay it until it query? If you can open an issue about this on GH, it would be great. To your point that using

local

to remove certain entries, I understand the idea of not touching

base

and use local as the only source of change. But if the above problem is fixed, I don’t see a strong need for this. If you think this is important, feel free to open an issue for this too, but I suggest do them separately.

Marc Gris

07/20/2023, 3:42 PM

Thanks for your comments. Good point regarding the lazy creation of datasets. 👍🏼 Will create a feature request on GH. Thx again. M

Nok Lam Chan

07/20/2023, 3:45 PM

For some reason I thought there are changes related to the connection but I cannot find it. Cc @Iñigo Hidalgo For the Ibis dataset, how is the connection handled?

Nok Lam Chan

07/20/2023, 3:46 PM

I found a old PR actually change it to singleton connection. https://github.com/kedro-org/kedro/pull/1163

Nok Lam Chan

07/20/2023, 3:46 PM

Feel free to create the issue, I will link the issues properly and discuss with team.

Marc Gris

07/20/2023, 3:46 PM

👍🏼 🙂

Iñigo Hidalgo

07/20/2023, 4:11 PM

For the Ibis dataset the connection is created on catalog initialization, when the first instance of the dataset is initialized. Subsequent instances will use the same connection. That’s only my current implementation, I never evaluated the time it takes to generate the connection, but I do think that creating a new connection on load would be kinda heavy

Iñigo Hidalgo

07/20/2023, 4:12 PM

I haven’t followed the whole discussion, but could you create a dummy dataset type to override the type locally which just doesn’t do anything?

👍🏼 1

👀 1

Iñigo Hidalgo

07/20/2023, 4:12 PM

Obviously it would break if you try to run a pipeline with it

Iñigo Hidalgo

07/20/2023, 4:13 PM

But if you just need a placeholder to run some other part of your code it should work

Iñigo Hidalgo

07/20/2023, 4:36 PM

For Ibis I guess I could also delay the connection creation to the first time an instance loads and then still reuse that connection 🤔 Seems less clean to me for some reason though, and I wonder what the true value would be.

Marc Gris

07/21/2023, 9:02 AM

Hi @Iñigo Hidalgo Thx for your messages and suggestions. 👍🏼 🙂 One of my “use case” was when working offline and not having a local db to query… My “reasoning” (lazyness ? 😅 ) is that it would be really nice & neat if only

pipeline_c

depends on a sql datasets, that

kedro run -p pipeline_a -pipeline_b

would work without any problem (and without having to “muddle” with anything)…

👍🏼 1

Nok Lam Chan

07/21/2023, 10:02 AM

This is a reasonable ask. Nudging for a G issue so it get discussed by the team.

👍🏼 1

Marc Gris

07/21/2023, 10:04 AM

Yes. Will take time to do this today 👍🏼

4 Views

Open in Slack

Previous Next