Guillaume Tauzin
09/09/2024, 12:58 PMconfirms
, then the execution of the node whose input is my incremental dataset should result in the creation/update of the CHECKPOINT file. This does not happen here. Running N times my pipeline will process N times the available data.
Introducing a confirms
in the nodes of my pipeline does not change this behavior. The logger will however print that the incremental dataset has been confirmed but no CHECKPOINT files will be generated/updated.
Calling in jupyter catalog.confirm("NAMESPACE.INCREMENTALDATASET") actually creates the checkpoint file.
Any hint on what might be happening? How can I investigate this further?Huong Nguyen
09/09/2024, 1:37 PMGuillaume Tauzin
09/09/2024, 1:41 PM"{source}.my_increment_dataset":
type: partitions.IncrementalDataset
path: DATADIRPATH/{source}/
dataset:
type: ifd.datasets.TDMSDataset
filename_suffix: ".tdms"
Note that TDMSDataset is a dataset I created to handle tdms files.Guillaume Tauzin
09/09/2024, 1:42 PMcheckpoint:
filepath: DATADIRPATH/{source}/MY_CHECKPOINT
Elena Khaustova
09/09/2024, 11:04 PMTDMSDataset
with a default one, for example pandas.CSVDataset
?Guillaume Tauzin
09/10/2024, 9:18 AMGuillaume Tauzin
09/10/2024, 9:25 AMconfirms
is not namespaced and at that time, @Nok Lam Chan suggested to try putting the namespace in the argument, e.g. confirms=namespace.data, as a workaround.Guillaume Tauzin
09/10/2024, 9:31 AMconfirms
as @Nok Lam Chan suggested. I believe the default confirms
is set to the incremental dataset name, without the namespace which does not actually exist.