hello what is the proper way to add a current timestamp to t Kedro #questions

Join Slack

hello, what is the proper way to add a current tim...

# questions

Gauthier Pierard

01/16/2025, 11:52 AM

hello, what is the proper way to add a current timestamp to the names of catalog entries thanks

Hall

01/16/2025, 11:52 AM

Someone will reply to you shortly. In the meantime, this might help:

Rashida Kanchwala

01/16/2025, 11:56 AM

Hi @Gauthier Pierard, if your goal is to version your dataset, you can set

versioned: True

in the catalog entry. This will save your datasets with a timestamp-based version for each kedro run. https://docs.kedro.org/en/stable/data/data_catalog.html#dataset-versioning

Gauthier Pierard

01/16/2025, 12:01 PM

thanks Rashida but I actually need more control. in general my save folders are like

output_folder_<parameter>_<from_date>_<to_date>

. where

from_date

and

to_date

are defined by a node and saved as memorydatasets in the catalog. is it possible to define other catalog entries whose name depends on previous entries?

Merel

01/16/2025, 1:08 PM

If I understand this correctly you'd essentially like to dynamically create your catalog based on previous runs?

Gauthier Pierard

01/16/2025, 1:11 PM

indeed. I suppose this is best done in python with something like

Copy code

CSVDataset(
    filepath="<s3://test_bucket/data/02_intermediate/company/motorbikes.csv>",
    load_args=dict(sep=",", skiprows=5, skipfooter=1, na_values=["#NA", "NA"]),
    credentials=dict(key="token", secret="key"),
)

and

Copy code

# save the dataset to data/01_raw/test.csv/<version>/test.csv
catalog.save("test_dataset", data1)

correct?

Merel

01/16/2025, 1:18 PM

The above allows you to save the data, but you wouldn't preserve the dataset entry in the catalog. Saving here doesn't add it to the catalog itself.

Merel

01/16/2025, 1:19 PM

Do you need to have the catalog for future processing or are you okay with just saving the data to storage?

Gauthier Pierard

01/16/2025, 1:25 PM

Yes i understand the catalog file won't be updated, only the catalog object in memory. However could I define a partitionedDataset at the parent directory that would load the dynamically generated output paths and files for future computations?

Rashida Kanchwala

01/16/2025, 1:35 PM

You could possibly use OmegaConfigLoaders and define it in settings.py and then define your catalog filepath as

filepath: data/02_intermediate/pypi_kedro_demo_${now:}.csv

Here is an example code - https://github.com/kedro-org/kedro/issues/2355#issuecomment-2260512795

Gauthier Pierard

01/16/2025, 1:39 PM

hmm this seems to involve the fle

datasets.py

with which I am not familiar, thanks for the idea in any case

Rashida Kanchwala

01/16/2025, 1:41 PM

You can ignore that file! It was just an example

👍 1

Merel

01/16/2025, 1:42 PM

> However could I define a partitionedDataset at the parent directory that would load the dynamically generated output paths and files for future computations? As far as I know this should be possible, because the load path you just provide the top level directory https://docs.kedro.org/en/stable/data/partitioned_and_incremental_datasets.html#partitioned-dataset-load

👍 1

Open in Slack

Previous Next