As a particular modular pipeline becomes more complex I m th Kedro #questions

As a particular modular pipeline becomes more comp...

Chris Schopp

05/28/2024, 4:02 PM

As a particular modular pipeline becomes more complex, I'm thinking of breaking up its

nodes.py

into multiple files in a

nodes/

directory. For example,

<project>.pipelines.<modular_pipeline>.nodes/

would have

logic1.py

logic2.py

with the functions that will be used in nodes and their private helper functions. And in

<modular_pipeline>.pipeline.py

I'd import the functions from

nodes/logic1

and

nodes/logic2

My question is, if I am wanting to do the above, should I just be breaking this up into multiple modular pipelines? The modular pipeline's purpose is to produce an input to a simulation, but the simulation's input is becoming more "refined" over time. So it makes sense to me to keep it as one modular pipeline but am curious how others approach this.

datajoely

05/28/2024, 4:42 PM

So the functions referenced by nodes in the modular pipeline can live anywhere, I feel the

nodes.py

we generate is a suggestion of where to get started…. but not a mandate

datajoely

05/28/2024, 4:42 PM

as projects grow in sophisticated we actually recommend that your business logic live in formal python packages / libraries outside of your project so they can be maintained, documented and tested without tight coupling to your pipeline

datajoely

05/28/2024, 4:43 PM

as for what’s recommended - it’s hard to say in general terms. I try to live by - ‘write code for someone else to read, even if that person is future you’. So look to organise your code in a way that feel readable, understandable and is easy to onboard a new team member

👍 2

💡 3

Open in Slack

Previous Next