bulb Cool insight on where Kedro is still introducing perfo Kedro #random

:bulb:Cool insight on where Kedro is still introdu...

Arnout Verboven

04/25/2025, 1:07 AM

💡Cool insight on where Kedro is still introducing performance overhead from a use case I'm working on In almost all cases, either I/O or node execution time will outweigh this overhead by a lot, making it irrelevant but I'm sharing it anyways 😄 In my case, I am running ~800 dynamically created small nodes (the nodes are all interdependent in a very complex relationship graph so I rely on Kedro's orchestration without the need for me to codify the execution order 🙏) Initially my pipeline took 1min40, and the following two changes reduced it to 8sec: • Disabling pluggy tracing (see issue) • Using "list" inputs instead of "dict" inputs in

node

Post

💡 13

🙌 3

🎉 4

Deepyaman Datta

04/25/2025, 1:18 AM

I wonder how many times

node.inputs

gets called for each node, and if caching could help on that side. Or, would there be any reason to not just convert

inputs

in the

Node.__init__()

method.

Deepyaman Datta

04/25/2025, 1:20 AM

Actually, if you'd be down to try doing the conversion in the

Node.__init__()

and measuring the difference in time, and also making a PR with this potential enhancement (if there's a substantial perf improvement) and making sure it doesn't break things... I would be very happy to take a look?

Chris Schopp

04/25/2025, 2:06 AM

😄 This feels like a Kedro-Viz Easter egg that Arnout found:

Copy code

<PipelineWarningContent
        isVisible={visible}
        title="Whoa, that's a chonky pipeline!"

😂 3

Kacper Leśniara

04/25/2025, 8:20 AM

Woah, that's very helpful! Thanks! We've been looking into speeding up >200 nodes kedro pipeline somehow (e.g. by grouping), as we noticed that most of our time is spent between the nodes. I just tested disabling tracing alone - speed up from 25s to 15s

🥳 1

Nok Lam Chan

04/25/2025, 9:04 AM

Tracing should be disabled there was a ticket but I cannot find it anymore. It was causing other issue as well when very few people are able to see the debug level log so I think it makes a strong case to make it optional instead of default

Nok Lam Chan

04/25/2025, 9:06 AM

Is it possible to read Substack without account? I cannot read it but curious if it has a breakdown of how much of the overhead are coming from trace and node respectively

Kacper Leśniara

04/25/2025, 9:08 AM

I had no problem accessing the blogpost, there is a breakdown indeed

thankyou 1

Nok Lam Chan

04/25/2025, 9:15 AM

Ah I can read it now, I somehow have the app installed so it keep forcing me to login.

Nok Lam Chan

04/25/2025, 9:17 AM

For the second point, it’s likely something Kedro internal causing the slowness rather than a list vs dict issue.

Nok Lam Chan

04/25/2025, 9:18 AM

@Arnout Verboven are you able to share the pipeline that you are profiling with? @Merel I think this fits into the pipeline Pipeline discussion and worth looking into.

Arnout Verboven

04/25/2025, 9:24 AM

@Deepyaman Datta Yeah, caching clearly helps 😄 • Original: ~1.5M calls to

_dict_inputs_to_list

(102s) • With caching: ~2k calls to

_dict_inputs_to_list

(0.12s) I've made a quick PR but couldn't get the tests to run, so maybe someone else could take this instead

👀 3

Arnout Verboven

04/25/2025, 9:25 AM

@Nok Lam Chan For my pipeline the overhead split was ~50/50 on trace and node inputs. Unfortunately the pipeline is confidential, however, I am sure it will be easy to reproduce dynamically creating a pipeline with 1k nodes with small dataframes as inputs (using dict inputs).

👍🏼 1

Nok Lam Chan

04/25/2025, 9:43 AM

Some quick profiling with

_dict_inputs_to_list

4.36 μs ± 37.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

The operation is quite fast so I am surprised it takes 100+ seconds

Arnout Verboven

04/25/2025, 9:47 AM

Oh FYI - in my pipeline I am using

update_wrapper(partial(my_func, ...), my func)

to provide non-Kedro arguments, which might make the inspect.signature slower? But also keep in mind

_dict_inputs_to_list

was run 1.5M times.

Juan Luis

04/25/2025, 10:17 AM

this is an awesome writeup @Arnout Verboven 🔥 and very amusing to read too 😄

❤️ 1

Juan Luis

04/25/2025, 10:18 AM

the tracing issue should be addressed before the next release https://github.com/kedro-org/kedro/issues/4504 hopefully sometime in May

Juan Luis

04/25/2025, 10:21 AM

looking at your flamegraphs, looks like most of

_dict_inputs_to_list

runtime is

inspect.signature(...).bind

. I have some ideas on the caching though

Artur Dobrogowski

04/25/2025, 11:16 AM

Awesome!

Juan Luis

04/25/2025, 12:04 PM

on a side note @Arnout Verboven how is the experience of generating nodes dynamically? I find it quite impressive that you created such a large pipeline that in the end executes in ~8 seconds, must be lots of small functions 😄

Arnout Verboven

04/25/2025, 2:05 PM

I like it! It feels very intuitive to read the code this way, and I typically use dataset factories and namespaced modular pipelines so it all works together quite nicely. Not sure if "impressive" is the right word haha. I'm indeed doing very simple computations in each node, which could surely be implemented a lot more efficiently by computing everything in one large dataframe. I think this is a typical tradeoff to make with Kedro (and generally any orchestrator?). For "parallel" computations on groups, do I • A: Split the groups into separate nodes (which makes my code overall look cleaner, especially within the node functions), or • B: Handle groups within the node function (which results in less overhead, and makes my pipeline code look a bit cleaner) where it becomes clear that in Option B I would have to codify any execution order of the groups myself

Copy code

# Option A

def process(df):
    ...
    return df

groups = ["X", "Y", "Z"]
for group in groups: # Process each group separately
    node(  # Filter
        func=partial(<filter>, group=group),
        inputs="df",
        outputs=f"{group}.df",
    )
    node(  # Process
        func=process,
        inputs=f"{group}.df",
        outputs=f"{group}.df_processed",
    )
node(  # Combine the groups
    func=<concat>,
    inputs=[f"{group}.df_processed" for group in groups],
    outputs="df_processed",
)

Copy code

# Option B

def process_groups(df):
    for group in df["group"].unique():  # or using groupby
        df.loc[df["group"] == group, ...
    return df

node(
    func=process_groups,
    inputs="df"
    outputs="df_processed",
)

💡 1

Deepyaman Datta

04/25/2025, 8:10 PM

FYI @Arnout Verboven existing tests are all passing; it's just missing test coverage for one line, and there's a typing error.

I've made a quick PR but couldn't get the tests to run, so maybe someone else could take this instead

I just left a comment on there, but happy to try taking it over (or somebody else on the team can) if you'd prefer.

Kacper Leśniara

05/09/2025, 8:22 AM

FYI we observed a ~2x improvement in speed when used in kedro-mlflow's packaged model in production with the tracing disabled (we have only a few cases of dicts that remap the inputs/outputs). I should probably add that we are using hooks there too, which slows it down in the first place, but by default others shouldn't have this problem

👀 1

Juan Luis

05/09/2025, 10:04 AM

@Kacper Leśniara if disabling tracing helped you, could you test https://github.com/kedro-org/kedro/pull/4705 and let us know if it achieved an equivalent effect?

3 Views

Open in Slack

Previous Next