Dear Kedro team, is there a canonical Kedro way t...
# questions
m
Dear Kedro team, is there a canonical Kedro way to timestamp the datasets? Situation: To compare different model run results we need "model run start timestamp" in the files so all the files from same model run have the same value. That way we can easily append the files when needed and display in Tableau e.g. as lines with different colors based on the timestamp column. Since the final visualizations use joins of different files, it is a must for us to have "model start timestamp" and not "file created" or "node started" timestamp. Aside coding this manually is there Kedro way for placing "model start timestamp" into the dataframes? Such that would still work when partial pipeline or individual node is run? Model has 30+ nodes so I'd love to minimize edits required in every node. Thank you!
j
hi @Martin Dekar! does our dataset versioning work for your use case? https://docs.kedro.org/en/stable/data/data_catalog.html#dataset-versioning
m
Hi Juanlu, Thanks for the reply. It does not as dataset versioning uses "...version string formatted as `YYYY-MM-DDThh.mm.ss.sssZ`". Created at the moment of when the data is output. If our model runs 20 mins and gradually outputs files. The timestamps will be different for the files. This would work if we could hard-set the timestamp to model start time. Which I believe is not possible (no expert on Kedro hooks, though) Do you see any possibility to modify the behavior of dataset versioning using e.g. existing Kedro configuration parameters? If not, can you think about another way aside of dataset versioning. Your support is truly appreciated!
j
hi @Martin Dekar, indeed for
AbstractVersionedDataset
the timestamp is generated the moment the data will be saved https://github.com/kedro-org/kedro/blob/df9f174864640de193b2b85f04d0c3e8aee7d22c/kedro/io/core.py#L563-L568 the only workaround I can think of is creating your own
CustomVersionedDataset
class, but then you'd have to re-implement your datasets inheriting from it. unfortunately there aren't many good options here. I already wrote a comment referencing this conversation in https://github.com/kedro-org/kedro/issues/2355, please feel free to upvote and also share your thoughts.
m
Thank you juanlu. Much appreciated. I'll explore the way of CustomVersionedDataset. I haven't unfortunately found a way to upvote the issue/comment, aside of giving it ThumbsUp emoji. I know one can upvote github Discussion item, but I do not see the option to do similar for Issue.
j
no other upvote mechanism outside of 👍🏼 reaction 😁 (on the first comment, so it gets counted in sorting)
m
haha.. love it! Thanks. Will do. Wish you a nice week!
🙌🏼 1