I want to create a new kedro project for ML and I am not sur Kedro #questions

I want to create a new kedro project for ML and I ...

Matthias Roels

02/15/2023, 8:23 PM

I want to create a new kedro project for ML and I am not sure how to properly structure it. I want to have a default pipeline consisting of a feat and modelling pipeline. Both the feat and modelling pipelines will consist of several sub-pipelines and I want to make sure that nested pipeline structure is somehow reflected in my project structure. I was thinking about nested dirs in the pipelines folder, e.g.

Copy code

pipelines/
  - feat/
    __init__.py
    pipelines.py. <—- contains all subpipelines in this folder e.g feat_sales
    - feat_sales/
      __init__.py
      nodes.py
      pipelines.py
    - …

Would this be the right approach? And if not, what is the recommended way to structure this? Do we use modular pipelines or regular pipelines?

Deepyaman Datta

02/15/2023, 11:09 PM

My default, Kedro only "discovers" pipelines at the top level (i.e. directly under the

pipelines/

folder). If you want Kedro to discover 2 pipelines,

feat

and

modeling

, then this is fine. It should be called

pipeline.py

and not

pipelines.py

, just to be clear. The

sales

subpackage, etc. can be imported from and used in `pipelines.feat.pipeline.py`'s

create_pipeline()

method, yes.

Do we use modular pipelines or regular pipelines?

There's not really a distinction. Note: With this approach you can't exactly do

kedro run --pipeline feat_sale

--is that important to you, or do you really only want to expose

feat

and

modeling

Matthias Roels

02/16/2023, 6:48 PM

And what is then the recommended approach to define e.g modular pipelines and subpipelines? I’m talking in terms of project (folder) structure…

Deepyaman Datta

02/16/2023, 6:51 PM

There's no problem with what you've proposed above. Subpipelines aren't anything special, so if your proposed structure works for you, that's fine

6 Views

Open in Slack

Previous Next