https://kedro.org/ logo
#questions
Title
# questions
f

Flavien

07/31/2023, 3:42 PM
Hi fellows, I followed the documentation for packaging Iris on databricks and it works really well šŸ‘. I wanted to go a step further, using
ManagedTableDataset
— which works great too — and run different independent pipelines defined on the same project, but I did not manage to do so. I modified the
databricks_run.py
to account for a
--pipeline
option but I think the problem is in packaging the project which does not take into account pipelines created through
kedro pipeline create
if I am not mistaken (but I probably am). Would you point me towards my mistake? Thanks!
OK, I think I found it. There was a missing
__init__.py
under
pipelines
after creation. šŸ˜…
šŸ‘ 1
d

datajoely

08/01/2023, 8:04 AM
Is there a way we could have made this clearer in the docs?
šŸ‘‹ 1
f

Flavien

08/01/2023, 9:17 AM
I managed to understand what confused me. If you start from
databricks-iris
and you create a pipeline from
kedro create pipeline
, the folder
pipelines
dos not have
__init__.py
as shown here
Copy code
kedro pipeline create abcdef
Creating the pipeline 'abcdef': OK
  Location: '/tmp/iris/src/iris/pipelines/abcdef'
Creating '/tmp/iris/src/tests/pipelines/abcdef/__init__.py': OK
Creating '/tmp/iris/src/tests/pipelines/abcdef/test_pipeline.py': OK
Creating '/tmp/iris/conf/base/parameters': 
  Creating '/tmp/iris/conf/base/parameters/abcdef.yml': OK

Pipeline 'abcdef' was successfully created.
While creating a new project from
kedro new
, the folder has indeed the necessary file for packaging
Copy code
└── src
    ā”œā”€ā”€ abcdef
    │   ā”œā”€ā”€ __init__.py
    │   ā”œā”€ā”€ __main__.py
    │   ā”œā”€ā”€ pipeline_registry.py
    │   ā”œā”€ā”€ pipelines
    │   │   └── __init__.py
    │   └── settings.py
    ā”œā”€ā”€ requirements.txt
    ā”œā”€ā”€ setup.py
__main__
is also not present in the starter which is a bit confusing if one follows the documentation on packaging. My humble opinion is that, maybe, it would be clearer to follow the usual structure for
databricks-iris
which would also allow for running different pipelines from the same package.
But
kedro
is still a fantastic library.
ā¤ļø 3
d

datajoely

08/01/2023, 9:18 AM
Yeah we should fix this! I’d also suggest we do an explicit check for a init.py before importing and provide a better error message