https://kedro.org/ logo
#questions
Title
# questions
c

Camilo Piñón

01/02/2024, 1:01 PM
Hello everyone! I have a question related to best practices. I am building a pipeline that splits a dataset in several dataframes based on a user column (one dataframe per user). I am thinking of implementing this using Dataset Factories. Would this be a suitable/correct solution? Is there a better workaround? The examples in the documentation use Dataset Factory to define inputs to load, but I am not sure if they can also be outputs of pipelines. Thank you!
n

Nok Lam Chan

01/02/2024, 1:03 PM
It can be used for both inputs and outputs. How would you consume the dataframe afterward? It sounds more suitable to use PartitionDataset in this case.
c

Camilo Piñón

01/02/2024, 1:06 PM
Yep, I haven't thought of that! Consuming afterward wouldn't be feasible so PartitionDataset should work! Thanks for the insight.
K 2