https://kedro.org/ logo
#questions
Title
# questions
i

Iñigo Hidalgo

11/08/2023, 5:18 PM
Following up this discussion here @datajoely
I have a time-series problem where I compute a lot of lags, rolling statistics etc. When designing my training pipeline, I have a target number of days I want my master table to include.
Due to the way lags are carried out in pandas, we need to pad our initial queries by the maximum length of lag, as otherwise we would get nulls at the start. This maximum would then be an input to some initial nodes which filter sql tables.
run something like
kedro run --params target_date:2023-11-01
and whilst its technically possible it's not nice to feed runtime arguments into catalog definitions to dynamically change load behaviour.
we do this for simple behaviors but the runtime params is kinda limited when working with nested dicts. maybe hooks could be a way forward?
after_catalog_created
?
n

Nok Lam Chan

11/08/2023, 5:21 PM
what do you mean by working with nested dict?
i

Iñigo Hidalgo

11/08/2023, 5:31 PM
say I have some config like filter_x: col_to_filter: datetime values_between: [2020-01-01, 2020-01-31] node's input is params:filter_x If I wanted to only change the final item in the list, I would need to pass the whole dictionary anew in the cli because the update in params:filter_x.values_between isn't actually propagated to the value of params:filter_x in the catalog
d

datajoely

11/08/2023, 5:32 PM
In 0.19.x this will be much easier with the soft merge omega conf mode
💡 1
n

Nok Lam Chan

11/08/2023, 5:33 PM
If I wanted to only change the final item in the list, I would need to pass the whole dictionary anew in the cli
Why? I thought it only update the 1 key
i

Iñigo Hidalgo

11/08/2023, 5:34 PM
Thanks for the heads up @datajoely, probably not good to overindex on this issue atm
n

Nok Lam Chan

11/08/2023, 5:37 PM
Maybe I remember it wrongly. I actually proposed soft update few years ago, then I was confused that this PR get merged.
i

Iñigo Hidalgo

11/08/2023, 5:38 PM
n

Nok Lam Chan

11/08/2023, 5:39 PM
Yeah - so I thought it works
i

Iñigo Hidalgo

11/08/2023, 5:39 PM
I need to check the version of that commit
do you have an easy way to check?
👍🏼 1
bc im on 0.17.1
n

Nok Lam Chan

11/08/2023, 5:39 PM
have you try to run it ? is it not working?
wait
i

Iñigo Hidalgo

11/08/2023, 5:40 PM
It's not an issue I'm having this instant, but it's an issue i've run into the past when trying similar things
n

Nok Lam Chan

11/08/2023, 5:40 PM
0.17.6 is the first version has it
👍 1
I think softupdate is always possible but kinda undocumented.
i

Iñigo Hidalgo

11/08/2023, 5:42 PM
🤡
Thanks for confirming that. All our projects are still on 0.17.1 so that's why
n

Nok Lam Chan

11/08/2023, 5:47 PM
Is there a reason why you cannot upgrade to 0.17.x? since it is supposed to be non-breaking release.
i

Iñigo Hidalgo

11/08/2023, 6:00 PM
Ik there was some stuff that broke, at least related to how we configured our cli.py file or something like that. We're basically just working on migrating to the latest
18.X
but it hasn't been a massive priority for us so it's slow progress
👀 1
i couldn't tell you for certain if it was that or some version pinning issue
n

Nok Lam Chan

11/08/2023, 6:04 PM
I see, not sure what’s the issue with cli.py, if it’s version pinning it is most likely datasets. You can now use
kedro-datasets
even if you are in 0.17.x. The downside is you need to define the full dataset path just like a third party plugin. i.e.
kedro_datasets.pandas.CSVDataset
i

Iñigo Hidalgo

11/08/2023, 6:05 PM
anyways i might go the way of the after_catalog_created hook since that should have all the info i need
all our datasets are also custom components from before azure was officially supported D:
👀 1