Hello everyone! I have a question related to best ...
# questions
c
Hello everyone! I have a question related to best practices. I am building a pipeline that splits a dataset in several dataframes based on a user column (one dataframe per user). I am thinking of implementing this using Dataset Factories. Would this be a suitable/correct solution? Is there a better workaround? The examples in the documentation use Dataset Factory to define inputs to load, but I am not sure if they can also be outputs of pipelines. Thank you!
n
It can be used for both inputs and outputs. How would you consume the dataframe afterward? It sounds more suitable to use PartitionDataset in this case.
c
Yep, I haven't thought of that! Consuming afterward wouldn't be feasible so PartitionDataset should work! Thanks for the insight.
K 2