Hi team beginner question is it possible to create a partiti Kedro #questions

Hi team, beginner question, is it possible to crea...

Luis Cano

11/18/2022, 4:53 AM

Hi team, beginner question, is it possible to create a partitioned dataset where you store many pickles with models, i.e:

Copy code

s3_path/train_outputs/
                 ├── 202201.pkl
                 ├── 202202.pkl
                 ├── 202203.pkl
                 ├── 202204.pkl

...

so on.

Is there a way of saving pickles this way? or maybe what would be a better way of doing this? any thoughts?

datajoely

11/18/2022, 10:07 AM

So enabling

versioning: True

will do this on a normal dataset, but we don’t currently give you control over the timestamp format so you’ll get one that it’s a lot more precise rather than month level

Yetunde

11/18/2022, 10:16 AM

Here's where we describe versioning in our documentation: https://kedro.readthedocs.io/en/stable/data/data_catalog.html#version-datasets-and-ml-models

Luis Cano

11/18/2022, 3:27 PM

The only way I found similar of how it was being saved in a looped function for each period with pickle:

Copy code

model_name:
   type: PartitionedDataSet
   path: "s3_path/folder_name/"
   dataset:
      type: pickle.PickleDataSet
   filename_suffix: ".pkl"

Since it was already looping for each period, every model was being saved in a dictionary and then with this way of defining the catalog it allowed me to save it the way it was before using the datacatalog. It might not be the most optimal thing though !

5 Views

Open in Slack

Previous Next