Dev question I m trying to update the base starter and other Kedro #questions

Dev question: I'm trying to update the base starte...

Artur Dobrogowski

04/09/2024, 4:23 PM

Dev question: I'm trying to update the base starter (and other starters) to fix some issues. Now I've been changing the starter in root of kedro thinking it's responsible for

kedro new

processing. Apparently it's not (entirely), and when you enable all of the options it uses

spaceflights-pyspark-viz

behind the scenes. So now that there is a proliferation of many versions of starters (4 variants of spaceflights) - what ways do you have to keep them consistent? Do you do some sort of cookie cutter repo templating? I wonder how to work with them and keep sane that they are all in sync. One of the bugs I wanted to fix seems to be in place of that other starter rather than base kedro.

Artur Dobrogowski

04/09/2024, 4:25 PM

Also bonus question, I didn't check yet, but since many sections are now optional with selection of features at creation - how to check in cookie cutter whether particular section is enable or not? I can't find example for that, rather most of the work is done inside cookie cutter hook. I wanted to update readme of starter with instructions how to run tests & build the docs with sphinx but only make them appear when given section is enabled.

Nok Lam Chan

04/09/2024, 4:30 PM

https://github.com/kedro-org/kedro-starters/blob/main/spaceflights-pandas-viz/hooks/post_gen_project.py It's filtered with

post_gen_project.py

Nok Lam Chan

04/09/2024, 4:33 PM

Well it's two parts and it's not really the most elegant solutions. Thus you see 4 different starters.

spacefilghts

spaceflights-viz

etc. For most tool, it handled by the cookiecutter post_gen hook, for

viz

and

pyspark

, it handles from

kedro

because copy pasting existing code file is tricky (i.e. settings.py) etc. We never get to explore better options. i.e.

copiers

or I've heard about

progen

recently.

Juan Luis

04/09/2024, 4:42 PM

I agree there's a fair bit of code duplication and that it's not very sustainable.

Artur Dobrogowski

04/09/2024, 5:08 PM

I don't really like this split into 4 additional starters without a proper way of syncing their common parts

Artur Dobrogowski

04/09/2024, 5:12 PM

solutions I'm thinking about could be maybe: copier repo for base kedro new that tries to keep in check other copies or to abbandon the approach of multiple different versions and rather code the template with logic based on cookie cutter and selected options (also problematic to maintain and test in different ways). I wish there was also a test step in cicd at let's say pre-release for sanity checking that all combination of options in base starter and official starter produce code that works, and its tests passes - then improve tests of those starters to also test the specific functionality they introduce (kedro viz, pyspark present, docs buildable, ruff linting throwing no errors and other base things expected of those starters)

👍🏼 1

Artur Dobrogowski

04/09/2024, 5:12 PM

I'll keep thinking about the possible approach but in the mean time I'll try to just do basic changes to fix the starters

👍🏼 1

Nok Lam Chan

04/09/2024, 5:24 PM

If you end up exploring copier, we would be very interested to learn more about it.

3 Views

Open in Slack

Previous Next