I wonder if the following might work:
1. Create a custom dataset that points to your Presto dataset and only reads a sample from it
2. Create a pipeline describing how to process a single set of samples
3. Write a simple loop that creates namespaced copies of your base pipeline - one per set of samples - and return the concatenation of these as your kedro pipeline
4. Write a collect node that expects
*args
input and give it all of your namespaced datasets
This would let you run many samples through a single pipeline definition. I guess though you would still have the issue of how to do the next round of sampling 😄