Hi all! Kedro 0.18.7: We’re trying to modify pipe...
# questions
j
Hi all! Kedro 0.18.7: We’re trying to modify pipelines “extra params” on the fly, and we guessed that a
before_pipeline_run
hook is the way to go. Can you advise us on the bests to achieve this? What we tried so far is condensed is this code.
Copy code
class ParamsHook:
    @hook_impl
    def before_pipeline_run(
        self, run_params: Dict[str, Any], pipeline: Any, catalog: DataCatalog
    ) -> None:
        catalog.add_feed_dict({"params:country": MemoryDataSet("ESP")}, replace=True)
In the hook: 1. In
run_params
we can see:
'extra_params': {'country': 'USA'}
2. In
catalog.list()
this entry:
'params:country'
,before and after invoking
add_feed_dict
But when params are printed in the node he value persist with the original value parsed by:
kedro run --params country=USA
Many thanks in advance! NOTE: The objective here is to be able to parse a list from the CLI, let’s say:
--params countries="ESP<>USA"
and do the split in the hook.
f
If you want to pass a list, you can run the pipeline in python:
Copy code
with KedroSession.create(extra_params={"country": ["ESP", "USA"]}) as s:
    s.run(pipeline_name="your_pipeline_name")
j
that’s a solution, but if possible we want to address this with the hook mechanism, Thx BTW
f
If you want to get the list from the CLI, you can do something like this, but it seems to be what you are doing:
Copy code
countries = catalog.load("params:country").split("<>")
catalog.add_feed_dict({"params:country": MemoryDataSet(countries)}, replace=True)
or am I missing something?
I checked it with a dummy pipeline that only print its argument, and with the hook, you correctly get a list
j
no, that’s it, but or some reason country param retains it’s original value when reaching a node
f
Note that the syntax for the params is
--params country:'ESP<>USA'
with
:
not
=
as a separator for key and value
👍 1
I have a pipeline that is only:
Copy code
def create_pipeline(**kwargs) -> Pipeline:
    return pipeline(
        [node(lambda x: print(repr(x)), inputs="params:country", outputs=None)]
    )
and I can correctly see the list
j
Running this:
kedro run --params countries="ESP<>USA"
Copy code
class ParamsHook:
    @hook_impl
    def before_pipeline_run(
        self, run_params: Dict[str, Any], pipeline: Any, catalog: DataCatalog
    ) -> None:
        countries = catalog.load("params:countries").split("<>")
        print("countries in hook:", countries)
        catalog.add_feed_dict(
            {"params:countries": MemoryDataSet(countries)}, replace=True
        )
I get:
countries in hook: ['ESP', 'USA']
And having this in node:
Copy code
def clean_raw(data: pd.DataFrame, params: Dict):
    countries = params["countries"]

    print("countries in node: ", countries)
countries in node:  ESP<>USA
f
You should not use
params
in your node, but rather pass the correct parameter as I did above
You did not change the whole params dict, but rather the one input that is named
params:country
j
so, there’s no solution where only hooks are involved?
f
Yes, the hook will work if you change you node to
Copy code
def clean_raw(data: pd.DataFrame, countries: List[str]):
    print("countries in node: ", countries)
and your pipeline to:
Copy code
pipeline([..., node(clean_raw, inputs={"data": ..., "countries": "params:countries"}, outputs=...)], ...)
j
clear, but I meant without touching anything else, just the hooks, what we’re trying to do here is an standardisation, you can use lists in the params catalog so we want to extend this behaviour to the CLI.
f
Copy code
class ParamsHook:
    @hook_impl
    def before_pipeline_run(self, catalog: DataCatalog) -> None:
        countries = catalog.load("params:country").split("<>")
        catalog.add_feed_dict(
            {"params:country": MemoryDataSet(countries)}, replace=True
        )
        parameters = catalog.load("parameters")
        parameters["country"] = countries
        catalog.add_feed_dict({"parameters": parameters}, replace=True)
K 1
Here, you explicitly change both the individual parameter and the whole
parameters
But again, it is not a good practice to pass the whole
parameters
as an input for a node
j
yes, we know, we have that refactoring task pending. Many thanks for your great help Florian!
f
You're welcome!
✔️ 1
👍 1