Hi Team, I have a daily job saving several version...
# questions
j
Hi Team, I have a daily job saving several versioned datasets. A downstream process (not written in Kedro) needs <version> (e.g.
data/01_raw/company/cars.csv/<version>/cars.csv
) so that it could pick up correct datasets to process. Is there a way to know which <version> is being used by Kedro?
r
Just to clarify — when you say "which version is being used by Kedro," do you mean during loading or saving? If you're loading a versioned dataset and haven't specified a version, Kedro will automatically load the latest version available. If you're saving data and not setting a version manually, Kedro will auto-generate a timestamped version at runtime.
j
Hi @Rashida Kanchwala, let me clarify. I need to write "timestamped version string" (
YYYY-MM-DDThh.mm.ss.sssZ
) that Kedro is using for saving in some external database.
r
Ok, you might be able to use kedro hooks to capture the
<version>
being saved. Something like
before_dataset_saved
could help with that https://docs.kedro.org/en/stable/hooks/index.html
j
Could you tell me which attributes should be used to capture the version?
Copy code
@hook_impl
def after_dataset_saved(self, dataset_name: str, node: Node) -> None:
r
Let me get back to you on this — I just realized the hook only gives access to the
dataset_name
, which doesn’t include the version info directly ..maybe it's the
after_catalog_created
hook. There might be another way to access it, so I’m looking into that now. Will update you soon!
j
It makes sense. Thank you @Rashida Kanchwala. You are awesome!
❤️ 1