I want to create a new kedro project for ML and I ...
# questions
m
I want to create a new kedro project for ML and I am not sure how to properly structure it. I want to have a default pipeline consisting of a feat and modelling pipeline. Both the feat and modelling pipelines will consist of several sub-pipelines and I want to make sure that nested pipeline structure is somehow reflected in my project structure. I was thinking about nested dirs in the pipelines folder, e.g.
Copy code
pipelines/
  - feat/
    __init__.py
    pipelines.py. <—- contains all subpipelines in this folder e.g feat_sales
    - feat_sales/
      __init__.py
      nodes.py
      pipelines.py
    - …
Would this be the right approach? And if not, what is the recommended way to structure this? Do we use modular pipelines or regular pipelines?
d
My default, Kedro only "discovers" pipelines at the top level (i.e. directly under the
pipelines/
folder). If you want Kedro to discover 2 pipelines,
feat
and
modeling
, then this is fine. It should be called
pipeline.py
and not
pipelines.py
, just to be clear. The
sales
subpackage, etc. can be imported from and used in `pipelines.feat.pipeline.py`'s
create_pipeline()
method, yes.
Do we use modular pipelines or regular pipelines?
There's not really a distinction. Note: With this approach you can't exactly do
kedro run --pipeline feat_sale
--is that important to you, or do you really only want to expose
feat
and
modeling
?
m
And what is then the recommended approach to define e.g modular pipelines and subpipelines? I’m talking in terms of project (folder) structure…
d
There's no problem with what you've proposed above. Subpipelines aren't anything special, so if your proposed structure works for you, that's fine