Hello, I’m thinking of how to do this workflow: I ...
# questions
l
Hello, I’m thinking of how to do this workflow: I have a list of inputs to a node - that list is dynamically created by a previous node. This list I want to feed into a consolidation node. What would be best practice her?
👀 1
d
One node creates the inputs to another node? I'm not sure whether this is what you need, but can you use a
PartitionedDataset
? If not, can you provide a bit more context on what you need to do?
l
the context is that I have a DS pipeline where the data is first segmented into multiple splits. Each split gets the same treatment (a modelling and validation pipeline) but then the splits are supposed to come together to do a final assessment. Ideally I want the whole flow to be reusable, so no hard-coded number of splits anywhere. I now I can use dataset factories, but I’m not sure how I can combine multiple datasets into the final validation pipeline/node
and ideally I’d want the nodes to stay “pure”, so I’d rather not deal with the splits inside the node (maybe that’s the way?)
d
Ah. And by "same treatment", you mean, a set of nodes/pipeline, so probably just
PartitionedDataSet
is not right here. I understand what you're asking now, and it's a fairly common ask (we've lumped it under the name "dynamic pipelines". I think somebody may be looking to share should more fleshed-out thoughts on this soon, but right now there's no "official" best practice on how to do this. Let me ping some part of the maintainer team, to see if anybody can/wants to jump in. 🙂
👍 1
l
that’d be amazing!
We see this pattern very frequently at datarobot (the segmented modelling).
m
@Lukas Innig ping me if you’re interested in dynamic pipelines 😉