Flavien
07/31/2023, 3:42 PMManagedTableDataset
ā which works great too ā and run different independent pipelines defined on the same project, but I did not manage to do so.
I modified the databricks_run.py
to account for a --pipeline
option but I think the problem is in packaging the project which does not take into account pipelines created through kedro pipeline create
if I am not mistaken (but I probably am). Would you point me towards my mistake?
Thanks!__init__.py
under pipelines
after creation. š
datajoely
08/01/2023, 8:04 AMFlavien
08/01/2023, 9:17 AMdatabricks-iris
and you create a pipeline from kedro create pipeline
, the folder pipelines
dos not have __init__.py
as shown here
kedro pipeline create abcdef
Creating the pipeline 'abcdef': OK
Location: '/tmp/iris/src/iris/pipelines/abcdef'
Creating '/tmp/iris/src/tests/pipelines/abcdef/__init__.py': OK
Creating '/tmp/iris/src/tests/pipelines/abcdef/test_pipeline.py': OK
Creating '/tmp/iris/conf/base/parameters':
Creating '/tmp/iris/conf/base/parameters/abcdef.yml': OK
Pipeline 'abcdef' was successfully created.
While creating a new project from kedro new
, the folder has indeed the necessary file for packaging
āāā src
āāā abcdef
ā āāā __init__.py
ā āāā __main__.py
ā āāā pipeline_registry.py
ā āāā pipelines
ā ā āāā __init__.py
ā āāā settings.py
āāā requirements.txt
āāā setup.py
__main__
is also not present in the starter which is a bit confusing if one follows the documentation on packaging.
My humble opinion is that, maybe, it would be clearer to follow the usual structure for databricks-iris
which would also allow for running different pipelines from the same package.kedro
is still a fantastic library.datajoely
08/01/2023, 9:18 AM