Following up <this discussion> here <@U03R8FW4HUZ>...
# questions
i
Following up this discussion here @datajoely
I have a time-series problem where I compute a lot of lags, rolling statistics etc. When designing my training pipeline, I have a target number of days I want my master table to include.
Due to the way lags are carried out in pandas, we need to pad our initial queries by the maximum length of lag, as otherwise we would get nulls at the start. This maximum would then be an input to some initial nodes which filter sql tables.
run something like
kedro run --params target_date:2023-11-01
and whilst its technically possible it's not nice to feed runtime arguments into catalog definitions to dynamically change load behaviour.
we do this for simple behaviors but the runtime params is kinda limited when working with nested dicts. maybe hooks could be a way forward?
after_catalog_created
?
n
what do you mean by working with nested dict?
i
say I have some config like filter_x: col_to_filter: datetime values_between: [2020-01-01, 2020-01-31] node's input is params:filter_x If I wanted to only change the final item in the list, I would need to pass the whole dictionary anew in the cli because the update in params:filter_x.values_between isn't actually propagated to the value of params:filter_x in the catalog
d
In 0.19.x this will be much easier with the soft merge omega conf mode
💡 1
n
If I wanted to only change the final item in the list, I would need to pass the whole dictionary anew in the cli
Why? I thought it only update the 1 key
i
Thanks for the heads up @datajoely, probably not good to overindex on this issue atm
n
Maybe I remember it wrongly. I actually proposed soft update few years ago, then I was confused that this PR get merged.
i
n
Yeah - so I thought it works
i
I need to check the version of that commit
do you have an easy way to check?
👍🏼 1
bc im on 0.17.1
n
have you try to run it ? is it not working?
wait
i
It's not an issue I'm having this instant, but it's an issue i've run into the past when trying similar things
n
0.17.6 is the first version has it
👍 1
I think softupdate is always possible but kinda undocumented.
i
🤡
Thanks for confirming that. All our projects are still on 0.17.1 so that's why
n
Is there a reason why you cannot upgrade to 0.17.x? since it is supposed to be non-breaking release.
i
Ik there was some stuff that broke, at least related to how we configured our cli.py file or something like that. We're basically just working on migrating to the latest
18.X
but it hasn't been a massive priority for us so it's slow progress
👀 1
i couldn't tell you for certain if it was that or some version pinning issue
n
I see, not sure what’s the issue with cli.py, if it’s version pinning it is most likely datasets. You can now use
kedro-datasets
even if you are in 0.17.x. The downside is you need to define the full dataset path just like a third party plugin. i.e.
kedro_datasets.pandas.CSVDataset
i
anyways i might go the way of the after_catalog_created hook since that should have all the info i need
all our datasets are also custom components from before azure was officially supported D:
👀 1