Kedro is an open-sourced Python framework for creating maintainable and modular data science code.

Kedro

Hey everyone, I'm looking for the "Kedro" way of doing a Monte Carlo sim. I have a very large Dataset in Presto and I want to repeatedly pull samples from it and run each group of samples through a pipeline and then rollup all of the results of the pipeline, currently I'm thinking of calling the pipeline from outside the kedro project.

I wonder if the following might work:
1. Create a custom dataset that points to your Presto dataset and only reads a sample from it
2. Create a pipeline describing how to process a single set of samples
3. Write a simple loop that creates namespaced copies of your base pipeline - one per set of samples - and return the concatenation of these as your kedro pipeline
4. Write a collect node that expects `*args` input and give it all of your namespaced datasets
This would let you run many samples through a single pipeline definition. I guess though you would still have the issue of how to do the next round of sampling :smile: