Hello Kedro experts!
Coming here for a bit of advice on Kedro Pipeline design choice.
Consider the following scenario:
1. You have catalog entries for datasets A to E
2. The datasets have different number of rows and different schema
3. The datasets can not be joined together to form a master table
4. Based on the value of a parameter, I want to pick either of the datasets and run the downstream pipeline with it
5. The pipeline is generic enough to handle either of the datasets
Defining a node to just select the data based on a parameter leads to unnecessary I/O. Any other design choice I could make here?
Thanks! 🙂