Guillaume Tauzin
09/09/2024, 12:58 PMconfirms, then the execution of the node whose input is my incremental dataset should result in the creation/update of the CHECKPOINT file. This does not happen here. Running N times my pipeline will process N times the available data.
Introducing a confirms in the nodes of my pipeline does not change this behavior. The logger will however print that the incremental dataset has been confirmed but no CHECKPOINT files will be generated/updated.
Calling in jupyter catalog.confirm("NAMESPACE.INCREMENTALDATASET") actually creates the checkpoint file.
Any hint on what might be happening? How can I investigate this further?Huong Nguyen
09/09/2024, 1:37 PMGuillaume Tauzin
09/09/2024, 1:41 PM"{source}.my_increment_dataset":
type: partitions.IncrementalDataset
path: DATADIRPATH/{source}/
dataset:
type: ifd.datasets.TDMSDataset
filename_suffix: ".tdms"
Note that TDMSDataset is a dataset I created to handle tdms files.Guillaume Tauzin
09/09/2024, 1:42 PMcheckpoint:
filepath: DATADIRPATH/{source}/MY_CHECKPOINTElena Khaustova
09/09/2024, 11:04 PMTDMSDataset with a default one, for example pandas.CSVDataset?Guillaume Tauzin
09/10/2024, 9:18 AMGuillaume Tauzin
09/10/2024, 9:25 AMconfirms is not namespaced and at that time, @Nok Lam Chan suggested to try putting the namespace in the argument, e.g. confirms=namespace.data, as a workaround.Guillaume Tauzin
09/10/2024, 9:31 AMconfirms as @Nok Lam Chan suggested. I believe the default confirms is set to the incremental dataset name, without the namespace which does not actually exist.