Juan Diego
04/13/2023, 3:07 PMbefore_pipeline_run
hook is the way to go.
Can you advise us on the bests to achieve this?
What we tried so far is condensed is this code.
class ParamsHook:
@hook_impl
def before_pipeline_run(
self, run_params: Dict[str, Any], pipeline: Any, catalog: DataCatalog
) -> None:
catalog.add_feed_dict({"params:country": MemoryDataSet("ESP")}, replace=True)
In the hook:
1. In run_params
we can see: 'extra_params': {'country': 'USA'}
2. In catalog.list()
this entry: 'params:country'
,before and after invoking add_feed_dict
But when params are printed in the node he value persist with the original value parsed by: kedro run --params country=USA
Many thanks in advance!
NOTE: The objective here is to be able to parse a list from the CLI, let’s say: --params countries="ESP<>USA"
and do the split in the hook.FlorianGD
04/13/2023, 3:12 PMwith KedroSession.create(extra_params={"country": ["ESP", "USA"]}) as s:
s.run(pipeline_name="your_pipeline_name")
Juan Diego
04/13/2023, 3:18 PMFlorianGD
04/13/2023, 3:46 PMcountries = catalog.load("params:country").split("<>")
catalog.add_feed_dict({"params:country": MemoryDataSet(countries)}, replace=True)
or am I missing something?Juan Diego
04/13/2023, 3:50 PMFlorianGD
04/13/2023, 3:50 PM--params country:'ESP<>USA'
with :
not =
as a separator for key and valuedef create_pipeline(**kwargs) -> Pipeline:
return pipeline(
[node(lambda x: print(repr(x)), inputs="params:country", outputs=None)]
)
and I can correctly see the listJuan Diego
04/13/2023, 3:57 PMkedro run --params countries="ESP<>USA"
class ParamsHook:
@hook_impl
def before_pipeline_run(
self, run_params: Dict[str, Any], pipeline: Any, catalog: DataCatalog
) -> None:
countries = catalog.load("params:countries").split("<>")
print("countries in hook:", countries)
catalog.add_feed_dict(
{"params:countries": MemoryDataSet(countries)}, replace=True
)
I get:
countries in hook: ['ESP', 'USA']
And having this in node:
def clean_raw(data: pd.DataFrame, params: Dict):
countries = params["countries"]
print("countries in node: ", countries)
countries in node: ESP<>USA
FlorianGD
04/13/2023, 4:01 PMparams
in your node, but rather pass the correct parameter as I did aboveparams:country
Juan Diego
04/13/2023, 4:05 PMFlorianGD
04/13/2023, 4:07 PMdef clean_raw(data: pd.DataFrame, countries: List[str]):
print("countries in node: ", countries)
and your pipeline to:
pipeline([..., node(clean_raw, inputs={"data": ..., "countries": "params:countries"}, outputs=...)], ...)
Juan Diego
04/13/2023, 4:12 PMFlorianGD
04/13/2023, 4:13 PMclass ParamsHook:
@hook_impl
def before_pipeline_run(self, catalog: DataCatalog) -> None:
countries = catalog.load("params:country").split("<>")
catalog.add_feed_dict(
{"params:country": MemoryDataSet(countries)}, replace=True
)
parameters = catalog.load("parameters")
parameters["country"] = countries
catalog.add_feed_dict({"parameters": parameters}, replace=True)
parameters
parameters
as an input for a nodeJuan Diego
04/13/2023, 4:17 PMFlorianGD
04/13/2023, 4:17 PM