Hi everyone, `local/catalog.yml` does not overrid...
# questions
m
Hi everyone,
local/catalog.yml
does not override the `base/catalog.yml`… Any idea what could cause this behavior ? Thx M.
n
Which ConfigLoader you are using? And can you show the relevant config for both
yml
files? p.s. I know I still owe some answer to other questions you asked, will get back to it.
👍🏼 1
m
image.png
oh… I think I now see what’s wrong 😅 ! Let me check that…
Ok… It’s a bit stupid, but, putting my ego aside, I’ll share what was wrong, because I’d love to have you’re advice on how to best handle this
image.png
image.png
My assumption was that commenting-out a dataset in the local catalog would result in this dataset not being created… But of course, comments being not evaluated, the
{{ table }}.db_columns
in
base
does get created in the catalog…
(Nota bene: This dataset only concerns one of many other pipelines) Let’s assume that I’m working offline. Since commenting out in
local
does not impact
base
, is it possible to work on / run the other pipelines in my project : • without commenting things out in
base
• and without having to create a local db ? Thanks in advance for your advice / suggestions Regards M. P.S: of course, it is not out of lazyness that I’m reluctant to comment in
base
😅 … It’s just that “messing with `base`” to be able to work locally feels “off”.
n
I see what you mean, but would your pipeline still works if you remove this datasets?
m
Yes, 3 modular-pipelines out 4 do not rely on this dataset.
n
local
are mean to override, in this case when you commented it out it simply means nothing to override.
👍 2
Does it affect you whether or not this dataset is inside your catalog? If you are not using it.
m
I’m not sure to understand your question. I’m personally not (yet) concerned with the pipeline that depends on that depends. I work on the pipelines “downstream” from it. But my colleagues are… hence its presence in `base`… 🙂 Don’t you think that this “edgy-case” might be pointing to the utility of having a mechanism that would allow from
local
to “ignore” things in
base
, i.e something like
conf/local/catalog.yml
Copy code
dataset:
    type: to_ignore
What do you think ? 🙂 Would that deserve a feature request ?
n
and without having to create a local db
I guess I am not clear about this yet, why do you need to create a local db? From my understanding this dataset doesn’t concern your pipeline, so whether or not it exists (or not), shouldn’t affect your pipeline.
m
=> If I’m working offline and “do not mess around” with base… then when I run
kedro run -p my-pipeline-without-db
would raise an error since the creation of the catalog is “modular-pipeline-agnostic”
Even if I actually do not need the dataset for the pipeline I’m working on, kedro will still try to query the db…
👍🏼 1
n
I think the more important question here is the connection is created when the Dataset is created, could we delay it until it query? If you can open an issue about this on GH, it would be great. To your point that using
local
to remove certain entries, I understand the idea of not touching
base
and use local as the only source of change. But if the above problem is fixed, I don’t see a strong need for this. If you think this is important, feel free to open an issue for this too, but I suggest do them separately.
m
Thanks for your comments. Good point regarding the lazy creation of datasets. 👍🏼 Will create a feature request on GH. Thx again. M
n
For some reason I thought there are changes related to the connection but I cannot find it. Cc @Iñigo Hidalgo For the Ibis dataset, how is the connection handled?
I found a old PR actually change it to singleton connection. https://github.com/kedro-org/kedro/pull/1163
Feel free to create the issue, I will link the issues properly and discuss with team.
m
👍🏼 🙂
i
For the Ibis dataset the connection is created on catalog initialization, when the first instance of the dataset is initialized. Subsequent instances will use the same connection. That’s only my current implementation, I never evaluated the time it takes to generate the connection, but I do think that creating a new connection on load would be kinda heavy
I haven’t followed the whole discussion, but could you create a dummy dataset type to override the type locally which just doesn’t do anything?
👍🏼 1
👀 1
Obviously it would break if you try to run a pipeline with it
But if you just need a placeholder to run some other part of your code it should work
For Ibis I guess I could also delay the connection creation to the first time an instance loads and then still reuse that connection 🤔 Seems less clean to me for some reason though, and I wonder what the true value would be.
m
Hi @Iñigo Hidalgo Thx for your messages and suggestions. 👍🏼 🙂 One of my “use case” was when working offline and not having a local db to query… My “reasoning” (lazyness ? 😅 ) is that it would be really nice & neat if only
pipeline_c
depends on a sql datasets, that
kedro run -p pipeline_a -pipeline_b
would work without any problem (and without having to “muddle” with anything)…
👍🏼 1
n
This is a reasonable ask. Nudging for a G issue so it get discussed by the team.
👍🏼 1
m
Yes. Will take time to do this today 👍🏼