Hi team, Can we create a node that updates update...
# questions
g
Hi team, Can we create a node that updates updates a catalog entry, for example my node will add one row to an existing catalog entry. So the idea was to save it in the same entry, but we can't have a node with same inputs and outputs. Have you guys encountered this issue? Any workaround?
1
y
Overall, I think it violates run reproducibility principles a bit. Kedro pipelines are Direct Acyclic Graphs (DAGs), and A here means that no step can have outputs same as inputs. The only exception is transcoding. The closest thing in Kedro to what you mentioned is IncrementalDataset, but that increments things from run to run, not within a single pipeline run.
👍 1
e
Hi Giovanna, there’s no straightforward way to do that. We do not recommend modify datasets inplace. As Yury mentioned this operation contradicts several kedro principles such as immutability of catalog and datasets and breaks the nodes execution order.
g
Thank you!! Got it!!
n
From Kedro's perspective, you cannot have the same output (dataset) duplicated. You can have multiple dataset pointing to the same physical location if it's necessary. For example:
Copy code
# catalog.yml
my_dataset:
   path: my_path/data.csv

my_enriched_dataset:
   path: my_path/data.csv
Then you can have a node
Copy code
node(do_something, inputs=my_dataset, ouputs=my_enriched_dataset)
Kedro is fine with this, but as @Yury Fedotov pointed out, this violates the reproducibility principle, so you cannot expect re-running
kedro run
would give you the same result.