Hi fellows I followed the documentation for packaging <https Kedro #questions

Hi fellows, I followed the documentation for packa...

Flavien

07/31/2023, 3:42 PM

Hi fellows, I followed the documentation for packaging Iris on databricks and it works really well 👍. I wanted to go a step further, using

ManagedTableDataset

— which works great too — and run different independent pipelines defined on the same project, but I did not manage to do so. I modified the

databricks_run.py

to account for a

--pipeline

option but I think the problem is in packaging the project which does not take into account pipelines created through

kedro pipeline create

if I am not mistaken (but I probably am). Would you point me towards my mistake? Thanks!

Flavien

07/31/2023, 3:57 PM

OK, I think I found it. There was a missing

__init__.py

under

pipelines

after creation. 😅

👍 1

datajoely

08/01/2023, 8:04 AM

Is there a way we could have made this clearer in the docs?

👋 1

Flavien

08/01/2023, 9:17 AM

I managed to understand what confused me. If you start from

databricks-iris

and you create a pipeline from

kedro create pipeline

, the folder

pipelines

dos not have

__init__.py

as shown here

Copy code

kedro pipeline create abcdef
Creating the pipeline 'abcdef': OK
  Location: '/tmp/iris/src/iris/pipelines/abcdef'
Creating '/tmp/iris/src/tests/pipelines/abcdef/__init__.py': OK
Creating '/tmp/iris/src/tests/pipelines/abcdef/test_pipeline.py': OK
Creating '/tmp/iris/conf/base/parameters': 
  Creating '/tmp/iris/conf/base/parameters/abcdef.yml': OK

Pipeline 'abcdef' was successfully created.

While creating a new project from

kedro new

, the folder has indeed the necessary file for packaging

Copy code

└── src
    ├── abcdef
    │   ├── __init__.py
    │   ├── __main__.py
    │   ├── pipeline_registry.py
    │   ├── pipelines
    │   │   └── __init__.py
    │   └── settings.py
    ├── requirements.txt
    ├── setup.py

__main__

is also not present in the starter which is a bit confusing if one follows the documentation on packaging. My humble opinion is that, maybe, it would be clearer to follow the usual structure for

databricks-iris

which would also allow for running different pipelines from the same package.

Flavien

08/01/2023, 9:17 AM

But

kedro

is still a fantastic library.

❤️ 3

datajoely

08/01/2023, 9:18 AM

Yeah we should fix this! I’d also suggest we do an explicit check for a init.py before importing and provide a better error message

8 Views

Open in Slack

Previous Next