Hello I am using namespaced incremental datasets and it seem Kedro #questions

Hello, I am using namespaced incremental datasets...

Guillaume Tauzin

09/09/2024, 12:58 PM

Hello, I am using namespaced incremental datasets and it seems running my pipeline does not create any CHECKPOINT file (even if I specify a checkpoint filepath). My understanding is that if I do not specify a

confirms

, then the execution of the node whose input is my incremental dataset should result in the creation/update of the CHECKPOINT file. This does not happen here. Running N times my pipeline will process N times the available data. Introducing a

confirms

in the nodes of my pipeline does not change this behavior. The logger will however print that the incremental dataset has been confirmed but no CHECKPOINT files will be generated/updated. Calling in jupyter catalog.confirm("NAMESPACE.INCREMENTALDATASET") actually creates the checkpoint file. Any hint on what might be happening? How can I investigate this further?

Huong Nguyen

09/09/2024, 1:37 PM

hey @Guillaume Tauzin, thanks for reach out. Let me look into this. In the meantime can you share with us how you configure your incremental dataset?

Guillaume Tauzin

09/09/2024, 1:41 PM

Hi @Huong Nguyen and thanks for your help. My dataset looks like that:

Copy code

"{source}.my_increment_dataset":
  type: partitions.IncrementalDataset
  path: DATADIRPATH/{source}/
  dataset:
    type: ifd.datasets.TDMSDataset
  filename_suffix: ".tdms"

Note that TDMSDataset is a dataset I created to handle tdms files.

Guillaume Tauzin

09/09/2024, 1:42 PM

I was also trying to specify a checkpoint path, but no checkpoint is generated independently of it:

Copy code

checkpoint:
    filepath: DATADIRPATH/{source}/MY_CHECKPOINT

Elena Khaustova

09/09/2024, 11:04 PM

Hi @Guillaume Tauzin, are you experiencing the same behaviour if replacing your

TDMSDataset

with a default one, for example

pandas.CSVDataset

Guillaume Tauzin

09/10/2024, 9:18 AM

Hi @Elena Khaustova I can confirm replacing the dataset type does not change the behavior. Still experiencing the mssing of CHECKPOINT file.

Guillaume Tauzin

09/10/2024, 9:25 AM

I am more inclined to believe that the problem could be linked to the fact that my incremental dataset is namespaced. I have previously ran into a related issue: https://github.com/kedro-org/kedro/issues/4039

confirms

is not namespaced and at that time, @Nok Lam Chan suggested to try putting the namespace in the argument, e.g. confirms=namespace.data, as a workaround.

Guillaume Tauzin

09/10/2024, 9:31 AM

I just solved my problem. It seems that when you have namespaced incremental dataset, you have to pass

confirms

as @Nok Lam Chan suggested. I believe the default

confirms

is set to the incremental dataset name, without the namespace which does not actually exist.

🥳 1

👍 1

2 Views

Open in Slack

Previous Next