https://kedro.org/ logo
#questions
Title
# questions
o

Olivia Lihn

02/13/2023, 1:49 PM
Hi everyone! I'm trying to create a hook to overwrite some parameters if scoring pipeline runs, but it does not seem to be working (the parameters dont get written - if not present - not overwritten - if present-). The code im using is the following:
Copy code
def before_pipeline_run(self, run_params, catalog: DataCatalog) -> None:
        """Change feature inclusion parameters for
        scoring pipeline
        """
        if run_params["pipeline_name"] == "scoring":
            # retrieve feature_list from catalog
            feature_list_df = catalog.load("modeling.feature_selection_report")
            feature_list = list(feature_list_df[feature_list_df.selected == True].feature.unique())

            # get list of feature engineering pipelines
            params = catalog.load("parameters")
            feateng_pipes = [fteng_name for fteng_name in params.keys() if fteng_name.endswith("_fteng")]

            # overwrite parameters
            for pipeline in feateng_pipes:
                catalog.add_all(
                    {f"params:{pipeline}.feature_inclusion_params.feature_list": feature_list,
                    f"params:{pipeline}.feature_inclusion_params.enable_regex": True},
                    replace=True
                )
I also tried using
run_params["params"]
without any luck, and tried returning the catalog but no luck. The hook runs (tested with print statements), so my guess is i'm missing something. Thanks!
K 1
m

marrrcin

02/13/2023, 2:06 PM
catalog.add
/
catalog.add_all
only adds
AbstractDataSet
entries to the catalog, without saving them, my guess is that you should have sth like this:
Copy code
catalog.add_all(
                    {f"params:{pipeline}.feature_inclusion_params.feature_list": MemoryDataSet(feature_list),
                    f"params:{pipeline}.feature_inclusion_params.enable_regex": MemoryDataSet(True)},
                    replace=True
                )
o

Olivia Lihn

02/13/2023, 2:06 PM
Thanks Marrrcin! i'll give it a try
m

marrrcin

02/13/2023, 2:08 PM
You should also consider calling
catalog.add_feed_dict
instead <-- this is what Kedro actually does to add parameters to the catalog
o

Olivia Lihn

02/13/2023, 2:08 PM
yeap, i tried that first and didnt work either, but i think with the MemoryDataSet might work
😞 still same, if i print the params just after calling add_feed_dict it shows no change
m

marrrcin

02/13/2023, 2:11 PM
Where do you print the params?
o

Olivia Lihn

02/13/2023, 2:12 PM
just after (inside the hooks imp) and then inside the node (just to make sure)
m

marrrcin

02/13/2023, 2:17 PM
OK, but do you use
parameters
or your specific keys e.g.
f"params:{pipeline}.feature_inclusion_params.feature_list"
?
Kedro copies the parameters multiple times during initialization of the catalog, so you should overwrite both
parameters
and specific keys if you want to use both in kedro nodes or just stick to one type
It happens here: https://github.com/kedro-org/kedro/blob/c83ce9aa1f4b3ab29a35f06be1bbc56341cf71e5/kedro/framework/context/context.py#L305 See that Kedro adds a catalog entry for
parameters
key as well as for every nested object, with a
params:
prefix. If you overwrite only
params:you_long_key
in the hook, then only nodes consuming the input in a form of
params:your_long_key
will get the modified value. If you use
parameters
in your nodes, then you need to overwrite whole
parameters
dict in your hook.
o

Olivia Lihn

02/13/2023, 2:26 PM
i use my specific keys
got it!
worked!! awesome! so what i ended up doing is loading the parameters exactly as I'll use them (in this case
params:<pipeline_name>
) Thanks so much!
👍 1
🥳 2
n

Nok Lam Chan

02/13/2023, 2:39 PM
Thank you @marrrcin awesome help here!
😎 1
🙂 1
d

datajoely

02/13/2023, 4:47 PM
🙂 ❤️
22 Views