Abhishek Bhatia
02/18/2024, 6:15 PMOmegaConfigLoader
custom_resolver with dataset factories?
In the settings.py
I define the following:
from kedro.config import OmegaConfigLoader
CONFIG_LOADER_CLASS = OmegaConfigLoader
def split_dot_into_path(dot_str: str)-> str:
return dot_str.replace(".", "/")
# Keyword arguments to pass to the `CONFIG_LOADER_CLASS` constructor.
CONFIG_LOADER_ARGS = {
"base_env": "base",
"default_run_env": "local",
# "config_patterns": {
# "spark" : ["spark*/"],
# "parameters": ["parameters*", "parameters*/**", "**/parameters*"],
# }
"custom_resolvers": {
"split_dot_into_path": split_dot_into_path
}
}
My pipeline looks like this:
from kedro.pipeline import Pipeline, node, pipeline
def create_pipeline(**kwargs)->Pipeline:
nodes = [
node(
lambda x: x,
inputs="raw_dataset",
outputs="processed_dataset"
)
]
return pipeline(nodes, inputs="raw_dataset", namespace="level1.level2")
And then the catalog entry looks like this:
raw_dataset:
type: pandas.CSVDataset
filepath: "data/01_raw/data.csv"
"{prefix}.processed_dataset":
type: pandas.CSVDataset
filepath: "data/03_processed/${split_dot_into_path:{prefix}}/data.csv"
So, basically, given any namespace, which is separated by dot .
, I want to set the nested structure of folder by converting the dot delimited namespace by forward slash delimited path at runtime.
Thanks! 🙂Dmitry Sorokin
02/19/2024, 12:05 PMAnkita Katiyar
02/19/2024, 12:54 PMomegaconf
config, including custom resolvers happens before the dataset factories are evaluated in the process. We’ve had similar questions in the past and we have an open issue for collecting use cases if you’d want to add yours to it - https://github.com/kedro-org/kedro/issues/3086 (it’s a bit of a catch-all issue for now but we’ll groom it soon)Ankita Katiyar
02/19/2024, 12:55 PMnamespace = "level1/level2"
without using the custom resolver work?Nok Lam Chan
02/19/2024, 1:56 PMnode(xxx, ... outputs="data", namespace=level1.level2)
, it get save to level1/level2/data.csv
?
I agree with @Ankita Katiyar this is not possible currently because it requires resolving config much later than the current implementation.
Alternatively, I think this can be simplified by having explicit nested namespace.
"{level1}.{level2}.processed_dataset":
type: pandas.CSVDataset
filepath: "data/03_processed/{level1}}/{level2}/data.csv"
Abhishek Bhatia
02/20/2024, 7:02 AMAbhishek Bhatia
02/20/2024, 7:13 AMNamespacedPandasCSVDataset
to just alter the filepath from being dot delimited .
to path-like
catalog entry:
"{prefix}.namespaced_dataset":
type: demo.namespaced_dataset.NamespacedPandasCSVDataset
base_path: "data/03_processed"
namespace: "{prefix}"
fname: "data.csv"
And my custom dataset looks like this:
class NamespacedPandasCSVDataset(pandas.CSVDataset):
def __init__(
self,
*,
base_path: str,
namespace: str,
fname: str,
load_args: dict[str, Any] = None,
save_args: dict[str, Any] = None,
version: Version = None,
credentials: dict[str, Any] = None,
fs_args: dict[str, Any] = None,
metadata: dict[str, Any] = None,
)-> None:
filepath = self._get_full_filepath(base_path, namespace, fname)
super().__init__(
filepath=filepath,
load_args=load_args,
save_args=save_args,
version=version,
credentials=credentials,
fs_args=fs_args,
metadata=metadata
)
self.base_path = base_path
self.namespace = namespace
self.fname = fname
def _get_full_filepath(self, base_path, namespace, fname):
return os.path.join(
base_path,
self.split_dot_into_path(namespace),
fname
)
@staticmethod
def split_dot_into_path(dot_str: str)-> str:
return dot_str.replace(".", "/")
Abhishek Bhatia
02/20/2024, 7:14 AMAbhishek Bhatia
02/20/2024, 7:31 AM.name
doesn't seem to be present.