hello, what is the proper way to add a current tim...
# questions
g
hello, what is the proper way to add a current timestamp to the names of catalog entries thanks
h
Someone will reply to you shortly. In the meantime, this might help:
r
Hi @Gauthier Pierard, if your goal is to version your dataset, you can set
versioned: True
in the catalog entry. This will save your datasets with a timestamp-based version for each kedro run. https://docs.kedro.org/en/stable/data/data_catalog.html#dataset-versioning
g
thanks Rashida but I actually need more control. in general my save folders are like
output_folder_<parameter>_<from_date>_<to_date>
. where
from_date
and
to_date
are defined by a node and saved as memorydatasets in the catalog. is it possible to define other catalog entries whose name depends on previous entries?
m
If I understand this correctly you'd essentially like to dynamically create your catalog based on previous runs?
g
indeed. I suppose this is best done in python with something like
Copy code
CSVDataset(
    filepath="<s3://test_bucket/data/02_intermediate/company/motorbikes.csv>",
    load_args=dict(sep=",", skiprows=5, skipfooter=1, na_values=["#NA", "NA"]),
    credentials=dict(key="token", secret="key"),
)
and
Copy code
# save the dataset to data/01_raw/test.csv/<version>/test.csv
catalog.save("test_dataset", data1)
correct?
m
The above allows you to save the data, but you wouldn't preserve the dataset entry in the catalog. Saving here doesn't add it to the catalog itself.
Do you need to have the catalog for future processing or are you okay with just saving the data to storage?
g
Yes i understand the catalog file won't be updated, only the catalog object in memory. However could I define a partitionedDataset at the parent directory that would load the dynamically generated output paths and files for future computations?
r
You could possibly use OmegaConfigLoaders and define it in settings.py and then define your catalog filepath as
filepath: data/02_intermediate/pypi_kedro_demo_${now:}.csv
Here is an example code - https://github.com/kedro-org/kedro/issues/2355#issuecomment-2260512795
g
hmm this seems to involve the fle
datasets.py
with which I am not familiar, thanks for the idea in any case
r
You can ignore that file! It was just an example
👍 1
m
> However could I define a partitionedDataset at the parent directory that would load the dynamically generated output paths and files for future computations? As far as I know this should be possible, because the load path you just provide the top level directory https://docs.kedro.org/en/stable/data/partitioned_and_incremental_datasets.html#partitioned-dataset-load
👍 1