Hi everyone slightly smiling face I m facing an issue with K Kedro #questions

Hi everyone ! :slightly_smiling_face: I’m facing a...

Mohamed El Guendouz

10/24/2025, 3:32 PM

Hi everyone ! 🙂 I’m facing an issue with Kedro-Viz. I have a node that performs a merge into a Delta Table. In this node, I pass two inputs: • the dataframe to be inserted, and • the destination Delta Table itself. Inside the node, I execute the merge logic directly. The problem is that Kedro-Viz treats the Delta Table as an input, whereas I’d like it to be represented as the output after the merge, so that the lineage is clearer and reflects the actual data flow. Is there a way to indicate which dataset is the true input and which one should be considered the final output in this kind of use case? Thanks for your help! 🙏

👀 1

Ravi Kumar Pilla

10/24/2025, 3:42 PM

Hi @Mohamed El Guendouz, If I understand correctly, your input and output dataset is the same ? This seems like an anti-pattern if we consider a node to be a pure python function and in-place mutations are discouraged. I would suggest you to have a destination dataset which points to the same location but a different entry in catalog. This way you will not modify the inputs to a node. Let me know if it works. Thank you

Mohamed El Guendouz

10/24/2025, 3:45 PM

@Ravi Kumar Pilla Actually, the existing Dataset for managing Delta Tables only supports reading, not updating. The merge logic isn’t handled by the Dataset at all. 😞 So I’m forced to perform the merge myself inside a Python function rather than having it managed at the Dataset level. As a result, the node doesn’t return a dataset as output — it returns either

None

or just a flag to confirm that everything worked correctly.

Ravi Kumar Pilla

10/24/2025, 3:50 PM

I did not quite understand this. I got that your node returns None or just a flag. If the existing dataset only supports reading, how are you merging it into a delta table ? Is this an issue with KedroViz or you have issue doing

kedro run

too ?

Ravi Kumar Pilla

10/24/2025, 3:53 PM

As far as I understand, your setup is like - ds1 , ds2 -> func (ds1=ds1+ds2) -> None . This is not the recommended approach to mutate datasets.

Mohamed El Guendouz

10/24/2025, 3:54 PM

@Ravi Kumar Pilla Thanks for your question! Let me clarify: The merge is done inside the node using the Python

delta

library (outside of Kedro’s Dataset abstraction). I load the existing Delta Table, run the merge logic programmatically, and then commit the changes. Since the current Dataset only supports reading, Kedro treats it purely as an input. That’s why the node technically returns either

None

or a simple flag — the actual update happens internally and is not returned as a Kedro-managed dataset. So there is no issue when running

kedro run

, it works fine. The concern is mainly with Kedro-Viz, because the lineage shows the Delta Table only as an input, while in reality it is also the updated output. Here’s a simplified example of what the node logic looks like:

Copy code

from delta.tables import DeltaTable

def merge_into_delta(existing_table, new_data_df) -> None:
    
    delta_table = DeltaTable.forPath(spark, existing_table_path)

    (
        delta_table.alias("target")
        .merge(
            new_data_df.alias("source"),
            "target.id = source.id"
        )
        .whenMatchedUpdateAll()
        .whenNotMatchedInsertAll()
        .execute()
    )

    # No dataset returned, just return None or a flag
    return None

👍 1

Ravi Kumar Pilla

10/24/2025, 3:59 PM

Even if it is an updated output, since it is not part of the node

outputs

, kedro-viz will have no idea that this node outputs something. Let me see if there is a workaround. For now, I think KedroViz is working as expected considering your node does an update inplace and your node returns None.

Mohamed El Guendouz

10/24/2025, 4:01 PM

Yes, I totally understand. The only issue is that the way the Dataset for Delta Tables was implemented, if I try to set the table as an output, it would raise a

DatasetError

😞

👍 1

Mohamed El Guendouz

10/24/2025, 4:04 PM

This is really a problem for our team, both in terms of pipeline design and for Kedro-Viz.

Ravi Kumar Pilla

10/24/2025, 4:07 PM

To show the lineage in kedro-viz, a workaround could be having an output with a similar name (may be a memory dataset). But let me think if there is a better solution

Mohamed El Guendouz

10/24/2025, 4:09 PM

Would it be possible to evolve the Dataset to handle merge and write operations for Delta Tables? This would simplify the node design and make Kedro-Viz lineage more accurate.

Ravi Kumar Pilla

10/24/2025, 4:10 PM

Yes I am looking at the delta tables code now. Is it possible for you to open an issue describing the pain points. This way we can track and prioritize in the upcoming sprints ? Thank you

Ravi Kumar Pilla

10/24/2025, 4:11 PM

https://github.com/kedro-org/kedro-plugins/issues

Mohamed El Guendouz

10/24/2025, 4:14 PM

Yes 👍 Bug report or feature request ?

Ravi Kumar Pilla

10/24/2025, 4:15 PM

A feature request would be nice. I also see a related spike - https://github.com/kedro-org/kedro-plugins/issues/542

Ravi Kumar Pilla

10/24/2025, 4:16 PM

We will try to address these issues in upcoming sprints. Thanks for your patience

Mohamed El Guendouz

10/24/2025, 4:21 PM

@Ravi Kumar Pilla I’ve created an issue : https://github.com/kedro-org/kedro-plugins/issues/1223 🙂

thankyou 1

Mohamed El Guendouz

10/24/2025, 4:24 PM

@Ravi Kumar Pilla Thank you for your help!

👍 1

Ravi Kumar Pilla

10/24/2025, 4:30 PM

Hi @Mohamed El Guendouz, In the meantime you can also create a custom dataset with the save operation something like - https://github.com/kedro-org/kedro-plugins/issues/542#issuecomment-1981483776 Thank you

👍 1

2 Views

Open in Slack

Previous Next