Hi fellows, I followed the documentation for packa...
# questions
f
Hi fellows, I followed the documentation for packaging Iris on databricks and it works really well šŸ‘. I wanted to go a step further, using
ManagedTableDataset
ā€” which works great too ā€” and run different independent pipelines defined on the same project, but I did not manage to do so. I modified the
databricks_run.py
to account for a
--pipeline
option but I think the problem is in packaging the project which does not take into account pipelines created through
kedro pipeline create
if I am not mistaken (but I probably am). Would you point me towards my mistake? Thanks!
OK, I think I found it. There was a missing
__init__.py
under
pipelines
after creation. šŸ˜…
šŸ‘ 1
d
Is there a way we could have made this clearer in the docs?
šŸ‘‹ 1
f
I managed to understand what confused me. If you start from
databricks-iris
and you create a pipeline from
kedro create pipeline
, the folder
pipelines
dos not have
__init__.py
as shown here
Copy code
kedro pipeline create abcdef
Creating the pipeline 'abcdef': OK
  Location: '/tmp/iris/src/iris/pipelines/abcdef'
Creating '/tmp/iris/src/tests/pipelines/abcdef/__init__.py': OK
Creating '/tmp/iris/src/tests/pipelines/abcdef/test_pipeline.py': OK
Creating '/tmp/iris/conf/base/parameters': 
  Creating '/tmp/iris/conf/base/parameters/abcdef.yml': OK

Pipeline 'abcdef' was successfully created.
While creating a new project from
kedro new
, the folder has indeed the necessary file for packaging
Copy code
ā””ā”€ā”€ src
    ā”œā”€ā”€ abcdef
    ā”‚   ā”œā”€ā”€ __init__.py
    ā”‚   ā”œā”€ā”€ __main__.py
    ā”‚   ā”œā”€ā”€ pipeline_registry.py
    ā”‚   ā”œā”€ā”€ pipelines
    ā”‚   ā”‚   ā””ā”€ā”€ __init__.py
    ā”‚   ā””ā”€ā”€ settings.py
    ā”œā”€ā”€ requirements.txt
    ā”œā”€ā”€ setup.py
__main__
is also not present in the starter which is a bit confusing if one follows the documentation on packaging. My humble opinion is that, maybe, it would be clearer to follow the usual structure for
databricks-iris
which would also allow for running different pipelines from the same package.
But
kedro
is still a fantastic library.
ā¤ļø 3
d
Yeah we should fix this! Iā€™d also suggest we do an explicit check for a init.py before importing and provide a better error message