Kedro is an open-sourced Python framework for creating maintainable and modular data science code.

Kedro

Hello everyone! I have a question related to best practices. I am building a pipeline that splits a dataset in several dataframes based on a user column (one dataframe per user). I am thinking of implementing this using Dataset Factories. Would this be a suitable/correct solution? Is there a better workaround? The examples in the documentation use Dataset Factory to define inputs to load, but I am not sure if they can also be outputs of pipelines. Thank you!

It can be used for both inputs and outputs. How would you consume the dataframe afterward? It sounds more suitable to use PartitionDataset in this case.

Yep, I haven't thought of that! Consuming afterward wouldn't be feasible so PartitionDataset should work! Thanks for the insight.