Is there in-built support (similar to versioned da...
# questions
s
Is there in-built support (similar to versioned datasets) to automatically create a new folder for each kedro pipeline run? So that each Kedro pipeline run is saved as a separate run, to get something like the following:
...
├── 2024-04-17-test-run2
│   ├── 01_raw            <-- Raw immutable data
│   ├── 02_intermediate   <-- Typed data
│   ├── 03_primary        <-- Domain model data
│   ├── 04_feature        <-- Model features
│   ├── 05_model_input    <-- Often called 'master tables'
│   ├── 06_models         <-- Serialised models
│   ├── 07_model_output   <-- Data generated by model runs
│   ├── 08_reporting      <-- Ad hoc descriptive cuts
├── 2023-03-01-test-run1
│   ├── 01_raw            <-- Raw immutable data
│   ├── 02_intermediate   <-- Typed data
│   ├── 03_primary        <-- Domain model data
│   ├── 04_feature        <-- Model features
│   ├── 05_model_input    <-- Often called 'master tables'
│   ├── 06_models         <-- Serialised models
│   ├── 07_model_output   <-- Data generated by model runs
│   ├── 08_reporting      <-- Ad hoc descriptive cuts
...
a
I don't believe there is such a feature, as it does not make sense for the inputs (01_raw would not make sense, all inputs would neccesarily be external). Nothing stops you from doing this yourself for outputs though - you can call custom omega config loader function that will generate a timestamp to use in paths. Example (pseudo code):
Copy code
in catalog:

entry:
  filepath: data/run-${timestamp:${globals:run_timestamp}}/02_intermediate/...
in globals:
Copy code
run_timestamp: ""
in settings:
Copy code
@cached
def get_timestamp(stamp):
  if stamp:
    return stamp
  return datetime.now().isoformat()

CONFIG_LOADER_ARGS = {
    "custom_resolvers": {
        "timestamp": get_timestamp,
     }
}
👍🏼 1
👍 3
and overwrite/provide run_timestamp if you want to re-use/read from some older run