Is there in built support similar to versioned datasets to a Kedro #questions

Is there in-built support (similar to versioned da...

Sergey S

04/17/2024, 5:15 PM

Is there in-built support (similar to versioned datasets) to automatically create a new folder for each kedro pipeline run? So that each Kedro pipeline run is saved as a separate run, to get something like the following:

...

├── 2024-04-17-test-run2

│   ├── 01_raw            <-- Raw immutable data

│   ├── 02_intermediate   <-- Typed data

│   ├── 03_primary        <-- Domain model data

│   ├── 04_feature        <-- Model features

│   ├── 05_model_input    <-- Often called 'master tables'

│   ├── 06_models         <-- Serialised models

│   ├── 07_model_output   <-- Data generated by model runs

│   ├── 08_reporting      <-- Ad hoc descriptive cuts

├── 2023-03-01-test-run1

│   ├── 01_raw            <-- Raw immutable data

│   ├── 02_intermediate   <-- Typed data

│   ├── 03_primary        <-- Domain model data

│   ├── 04_feature        <-- Model features

│   ├── 05_model_input    <-- Often called 'master tables'

│   ├── 06_models         <-- Serialised models

│   ├── 07_model_output   <-- Data generated by model runs

│   ├── 08_reporting      <-- Ad hoc descriptive cuts

...

Artur Dobrogowski

04/17/2024, 8:00 PM

I don't believe there is such a feature, as it does not make sense for the inputs (01_raw would not make sense, all inputs would neccesarily be external). Nothing stops you from doing this yourself for outputs though - you can call custom omega config loader function that will generate a timestamp to use in paths. Example (pseudo code):

Copy code

in catalog:

entry:
  filepath: data/run-${timestamp:${globals:run_timestamp}}/02_intermediate/...

in globals:

Copy code

run_timestamp: ""

in settings:

Copy code

@cached
def get_timestamp(stamp):
  if stamp:
    return stamp
  return datetime.now().isoformat()

CONFIG_LOADER_ARGS = {
    "custom_resolvers": {
        "timestamp": get_timestamp,
     }
}

👍🏼 1

👍 3

Artur Dobrogowski

04/17/2024, 8:04 PM

and overwrite/provide run_timestamp if you want to re-use/read from some older run

2 Views

Open in Slack

Previous Next