Hi, I’m on kedro 0.18.3 trying to override a templ...
# questions
v
Hi, I’m on kedro 0.18.3 trying to override a templated variable in the data catalog with runtime configuration. So
catalog.yml
has
filepath: "${configurable_filepath}"
and I’d like to do
kedro run --params configurable_filepath:/path/to/file
. A similar question was asked previously https://linen-discord.kedro.org/t/2203662/Hi-all-I-have-a-beginner-question-on-Kedro-0-18-2-I-have-a-T with writing a custom TemplatedConfigLoader as solution: https://github.com/noklam/kedro_gallery/blob/master/template_config_loader_demo/src/template_config_loader_demo/settings.py Is this the recommended approach or is there a way of achieving what I want without writing a custom TemplatedConfigLoader accessing private variables? Is there no other way to add all runtime parameters to the global dict? I’d really want to avoid that if possible in case a future kedro update changes things.
d
In general we don’t expect filepaths to be dynamically it makes reproducibility very difficult, is why it’s not natively supported
My question to you is why do you need to pass the file in at runtime?
v
The example would be running the same job/pipeline with multiple different inputs, i.e. filepaths
d
Well we have a solution for that called Modular Pipelines https://kedro.readthedocs.io/en/stable/nodes_and_pipelines/modular_pipelines.html
the idea is that you can reuse the same code for different inputs
you would still need to provide static catalog entries, but it allows you essentially ‘instantiate’ a pipeline and namespace process
v
Say you need to run a pipeline for a varying number of inputs, where the inputs may come from a query in a database. That doesn’t seem compatible, or at least convenient, with a static catalog.
d
ah gotcha - that goes against the principle of reproducibility
v
I don’t see that.. If I really wanted to reproduce it I could keep track of all the inputs I’ve had over time and reproduce it. It’s just not convenient to represent the input as something static
d
so you can absolutely achieve this, probably the best way to do so is with a pipeline hook which mutates the catalog
v
But… I’m not here to argue about kedro’s design choices ofc 🙂
d
but it’s a piece where we’re opinionated on because it makes things difficult to maintain / debug in the past
v
Yes, ok. I’m just thinking in reality data source are often changing. At least in my world 🙂 But then I know there is no “easy” way.
d
hooks are super nice to work with
the before pipeline run hook has access to the live catalog entry
and
extra_params
can be passed from the CLI
v
ok, thanks
So, after having a look at the before_pipeline_run hook, it looks like I’d have to break into some dataset objects. Even if I could find a
path
attribute to change ther I’m not sure it improves on the the original solution of a custom TemplatedConfigLoader. Am I missing something with what you proposed?