Hey everyone, I'm looking for the "Kedro" way of d...
# questions
b
Hey everyone, I'm looking for the "Kedro" way of doing a Monte Carlo sim. I have a very large Dataset in Presto and I want to repeatedly pull samples from it and run each group of samples through a pipeline and then rollup all of the results of the pipeline, currently I'm thinking of calling the pipeline from outside the kedro project.
b
I wonder if the following might work: 1. Create a custom dataset that points to your Presto dataset and only reads a sample from it 2. Create a pipeline describing how to process a single set of samples 3. Write a simple loop that creates namespaced copies of your base pipeline - one per set of samples - and return the concatenation of these as your kedro pipeline 4. Write a collect node that expects
*args
input and give it all of your namespaced datasets This would let you run many samples through a single pipeline definition. I guess though you would still have the issue of how to do the next round of sampling 😄
👍 1
💡 2