IceAsher Chew
01/05/2024, 1:47 AM"{name}_data":
type: pandas.CSVDataset
filepath: data/01_raw/{name}_data.csv
When using Kedro dataset factories, Kedro run behaves like a loop, running each {name} at a time.
how do i output which {name} it is at inside my node/function?
#ideally, I want to save the {name}
def function(dataset):
current = {name}
Deepyaman Datta
01/05/2024, 5:12 AMdataset
to include the name... somewhere (because e.g. a pandas DataFrame is a Python object you can arbitrarily attach attributes to).
# src/<package_name>/hooks.py
from kedro.framework.hooks import hook_impl
from <http://kedro.io|kedro.io> import DataCatalog
def modify(df, name):
assert name.endswith("_data")
df.myname = name[: -len("_data")]
return df
class HackyHooks:
@hook_impl
def before_node_run(self, inputs) -> None:
inputs.update(
{k: modify(v, k) for k, v in inputs.items() if k.endswith("_data")}
)
# src/<package_name>/pipelines/<pipeline_name>/pipeline.py
from kedro.pipeline import Pipeline, node, pipeline
from .nodes import function
def create_pipeline(**kwargs) -> Pipeline:
return pipeline([node(function, "foo_data", "first_save")])
# src/<package_name>/pipelines/<pipeline_name>/nodes.py
import pandas as pd
def function(dataset):
return pd.DataFrame({"value": [dataset.myname]})
IceAsher Chew
01/05/2024, 5:51 AMdf.myname = name