https://kedro.org/ logo
#questions
Title
# questions
i

IceAsher Chew

01/05/2024, 1:47 AM
Hi! RE: Kedro dataset factories
Copy code
"{name}_data":
  type: pandas.CSVDataset
  filepath: data/01_raw/{name}_data.csv
When using Kedro dataset factories, Kedro run behaves like a loop, running each {name} at a time. how do i output which {name} it is at inside my node/function?
Copy code
#ideally, I want to save the {name}
def function(dataset):
    current = {name}
d

Deepyaman Datta

01/05/2024, 5:12 AM
I think this is hacky AF and highly recommend not doing this, but one way to accomplish this is to have a hook modify
dataset
to include the name... somewhere (because e.g. a pandas DataFrame is a Python object you can arbitrarily attach attributes to).
Copy code
# src/<package_name>/hooks.py
from kedro.framework.hooks import hook_impl
from <http://kedro.io|kedro.io> import DataCatalog


def modify(df, name):
    assert name.endswith("_data")
    df.myname = name[: -len("_data")]
    return df


class HackyHooks:
    @hook_impl
    def before_node_run(self, inputs) -> None:
        inputs.update(
            {k: modify(v, k) for k, v in inputs.items() if k.endswith("_data")}
        )
Copy code
# src/<package_name>/pipelines/<pipeline_name>/pipeline.py
from kedro.pipeline import Pipeline, node, pipeline
from .nodes import function


def create_pipeline(**kwargs) -> Pipeline:
    return pipeline([node(function, "foo_data", "first_save")])
Copy code
# src/<package_name>/pipelines/<pipeline_name>/nodes.py
import pandas as pd


def function(dataset):
    return pd.DataFrame({"value": [dataset.myname]})
👍 1
i

IceAsher Chew

01/05/2024, 5:51 AM
thank you so much!!!! i modified it a little and it works.
Copy code
df.myname = name
👍 1