:bulb:Cool insight on where Kedro is still introdu...
# random
a
💡Cool insight on where Kedro is still introducing performance overhead from a use case I'm working on In almost all cases, either I/O or node execution time will outweigh this overhead by a lot, making it irrelevant but I'm sharing it anyways 😄 In my case, I am running ~800 dynamically created small nodes (the nodes are all interdependent in a very complex relationship graph so I rely on Kedro's orchestration without the need for me to codify the execution order 🙏) Initially my pipeline took 1min40, and the following two changes reduced it to 8sec: • Disabling pluggy tracing (see issue) • Using "list" inputs instead of "dict" inputs in
node
Post
💡 13
🙌 3
🎉 4
d
I wonder how many times
node.inputs
gets called for each node, and if caching could help on that side. Or, would there be any reason to not just convert
inputs
in the
Node.__init__()
method.
Actually, if you'd be down to try doing the conversion in the
Node.__init__()
and measuring the difference in time, and also making a PR with this potential enhancement (if there's a substantial perf improvement) and making sure it doesn't break things... I would be very happy to take a look?
c
😄 This feels like a Kedro-Viz Easter egg that Arnout found:
Copy code
<PipelineWarningContent
        isVisible={visible}
        title="Whoa, that's a chonky pipeline!"
😂 3
k
Woah, that's very helpful! Thanks! We've been looking into speeding up >200 nodes kedro pipeline somehow (e.g. by grouping), as we noticed that most of our time is spent between the nodes. I just tested disabling tracing alone - speed up from 25s to 15s
🥳 1
n
Tracing should be disabled there was a ticket but I cannot find it anymore. It was causing other issue as well when very few people are able to see the debug level log so I think it makes a strong case to make it optional instead of default
Is it possible to read Substack without account? I cannot read it but curious if it has a breakdown of how much of the overhead are coming from trace and node respectively
k
I had no problem accessing the blogpost, there is a breakdown indeed
thankyou 1
n
Ah I can read it now, I somehow have the app installed so it keep forcing me to login.
For the second point, it’s likely something Kedro internal causing the slowness rather than a list vs dict issue.
@Arnout Verboven are you able to share the pipeline that you are profiling with? @Merel I think this fits into the pipeline Pipeline discussion and worth looking into.
a
@Deepyaman Datta Yeah, caching clearly helps 😄 • Original: ~1.5M calls to
_dict_inputs_to_list
(102s) • With caching: ~2k calls to
_dict_inputs_to_list
(0.12s) I've made a quick PR but couldn't get the tests to run, so maybe someone else could take this instead
👀 3
@Nok Lam Chan For my pipeline the overhead split was ~50/50 on trace and node inputs. Unfortunately the pipeline is confidential, however, I am sure it will be easy to reproduce dynamically creating a pipeline with 1k nodes with small dataframes as inputs (using dict inputs).
👍🏼 1
n
Some quick profiling with
_dict_inputs_to_list
4.36 μs ± 37.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
The operation is quite fast so I am surprised it takes 100+ seconds
a
Oh FYI - in my pipeline I am using
update_wrapper(partial(my_func, ...), my func)
to provide non-Kedro arguments, which might make the inspect.signature slower? But also keep in mind
_dict_inputs_to_list
was run 1.5M times.
j
this is an awesome writeup @Arnout Verboven 🔥 and very amusing to read too 😄
❤️ 1
the tracing issue should be addressed before the next release https://github.com/kedro-org/kedro/issues/4504 hopefully sometime in May
looking at your flamegraphs, looks like most of
_dict_inputs_to_list
runtime is
inspect.signature(...).bind
. I have some ideas on the caching though
a
Awesome!
j
on a side note @Arnout Verboven how is the experience of generating nodes dynamically? I find it quite impressive that you created such a large pipeline that in the end executes in ~8 seconds, must be lots of small functions 😄
a
I like it! It feels very intuitive to read the code this way, and I typically use dataset factories and namespaced modular pipelines so it all works together quite nicely. Not sure if "impressive" is the right word haha. I'm indeed doing very simple computations in each node, which could surely be implemented a lot more efficiently by computing everything in one large dataframe. I think this is a typical tradeoff to make with Kedro (and generally any orchestrator?). For "parallel" computations on groups, do I • A: Split the groups into separate nodes (which makes my code overall look cleaner, especially within the node functions), or • B: Handle groups within the node function (which results in less overhead, and makes my pipeline code look a bit cleaner) where it becomes clear that in Option B I would have to codify any execution order of the groups myself
Copy code
# Option A

def process(df):
    ...
    return df

groups = ["X", "Y", "Z"]
for group in groups: # Process each group separately
    node(  # Filter
        func=partial(<filter>, group=group),
        inputs="df",
        outputs=f"{group}.df",
    )
    node(  # Process
        func=process,
        inputs=f"{group}.df",
        outputs=f"{group}.df_processed",
    )
node(  # Combine the groups
    func=<concat>,
    inputs=[f"{group}.df_processed" for group in groups],
    outputs="df_processed",
)
Copy code
# Option B

def process_groups(df):
    for group in df["group"].unique():  # or using groupby
        df.loc[df["group"] == group, ...
    return df

node(
    func=process_groups,
    inputs="df"
    outputs="df_processed",
)
💡 1
d
FYI @Arnout Verboven existing tests are all passing; it's just missing test coverage for one line, and there's a typing error.
I've made a quick PR but couldn't get the tests to run, so maybe someone else could take this instead
I just left a comment on there, but happy to try taking it over (or somebody else on the team can) if you'd prefer.
k
FYI we observed a ~2x improvement in speed when used in kedro-mlflow's packaged model in production with the tracing disabled (we have only a few cases of dicts that remap the inputs/outputs). I should probably add that we are using hooks there too, which slows it down in the first place, but by default others shouldn't have this problem
👀 1
j
@Kacper Leśniara if disabling tracing helped you, could you test https://github.com/kedro-org/kedro/pull/4705 and let us know if it achieved an equivalent effect?