Hello Kedro experts! Coming here for a bit of adv...
# questions
a
Hello Kedro experts! Coming here for a bit of advice on Kedro Pipeline design choice. Consider the following scenario: 1. You have catalog entries for datasets A to E 2. The datasets have different number of rows and different schema 3. The datasets can not be joined together to form a master table 4. Based on the value of a parameter, I want to pick either of the datasets and run the downstream pipeline with it 5. The pipeline is generic enough to handle either of the datasets Defining a node to just select the data based on a parameter leads to unnecessary I/O. Any other design choice I could make here? Thanks! 🙂
d
So we don’t encourage this sort of conditional dynamism or event based routing because it breaks all of the assumptions around reproducibility
With Kedro in its current form I’m not sure I’d recommend it here
m
Can you do modular pipeline instead and either invoke it with namespace or with tags?
plusone 2
a
@marrrcin Thanks for this. I think yes, I agree that namespacing/tags could be a viable way here.
👍 1