Hello, I am using namespaced incremental datasets...
# questions
g
Hello, I am using namespaced incremental datasets and it seems running my pipeline does not create any CHECKPOINT file (even if I specify a checkpoint filepath). My understanding is that if I do not specify a
confirms
, then the execution of the node whose input is my incremental dataset should result in the creation/update of the CHECKPOINT file. This does not happen here. Running N times my pipeline will process N times the available data. Introducing a
confirms
in the nodes of my pipeline does not change this behavior. The logger will however print that the incremental dataset has been confirmed but no CHECKPOINT files will be generated/updated. Calling in jupyter catalog.confirm("NAMESPACE.INCREMENTALDATASET") actually creates the checkpoint file. Any hint on what might be happening? How can I investigate this further?
h
hey @Guillaume Tauzin, thanks for reach out. Let me look into this. In the meantime can you share with us how you configure your incremental dataset?
g
Hi @Huong Nguyen and thanks for your help. My dataset looks like that:
Copy code
"{source}.my_increment_dataset":
  type: partitions.IncrementalDataset
  path: DATADIRPATH/{source}/
  dataset:
    type: ifd.datasets.TDMSDataset
  filename_suffix: ".tdms"
Note that TDMSDataset is a dataset I created to handle tdms files.
I was also trying to specify a checkpoint path, but no checkpoint is generated independently of it:
Copy code
checkpoint:
    filepath: DATADIRPATH/{source}/MY_CHECKPOINT
e
Hi @Guillaume Tauzin, are you experiencing the same behaviour if replacing your
TDMSDataset
with a default one, for example
pandas.CSVDataset
?
g
Hi @Elena Khaustova I can confirm replacing the dataset type does not change the behavior. Still experiencing the mssing of CHECKPOINT file.
I am more inclined to believe that the problem could be linked to the fact that my incremental dataset is namespaced. I have previously ran into a related issue: https://github.com/kedro-org/kedro/issues/4039
confirms
is not namespaced and at that time, @Nok Lam Chan suggested to try putting the namespace in the argument, e.g. confirms=namespace.data, as a workaround.
I just solved my problem. It seems that when you have namespaced incremental dataset, you have to pass
confirms
as @Nok Lam Chan suggested. I believe the default
confirms
is set to the incremental dataset name, without the namespace which does not actually exist.
🥳 1
👍 1