Iñigo Hidalgo
04/10/2023, 3:02 PMDeepyaman Datta
04/10/2023, 3:17 PMJavier del Villar
04/10/2023, 4:39 PMDeepyaman Datta
04/10/2023, 4:42 PM• If you have very big data and/or a slow storage, having lots of nodes can increase the time of your I/O operations a lot.This isn't an issue if using `MemoryDataSet`s, though. Would never recommend breaking something down into a lot of nodes and persisting each output.
Javier del Villar
04/10/2023, 4:44 PMCachedDataSet
Nero Okwa
04/12/2023, 10:01 AMIñigo Hidalgo
04/12/2023, 2:36 PMI don't think Kedro pipeline design should be determined by how it will be deployed in production
Deepyaman Datta
04/12/2023, 3:47 PMThe workflow orchestrator question and its partial incompatibility with memorydatasets is what got us talking about this.I don't think the node-to-task Kedro-to-orchestrator mapping is reasonable, and it makes more sense to map modular pipelines to orchestrator tasks. I assume you have this question because you're doing node-to-task?
Iñigo Hidalgo
04/12/2023, 3:50 PMDeepyaman Datta
04/12/2023, 4:50 PMBut if I'm not mistaken, in the Prefect deployment documentation, each individual node is a task.You're not mistaken, and most of the deployment docs do suggest a per-node deployment; however, they're also rather outdated (the Prefect deployment doc doesn't support Prefect 2.0, and in general a major pain point for the deployment docs is keeping them up to date and in line with the latest thinking). If you're specifically interested in Prefect, here are some of my thoughts: https://kedro-org.slack.com/archives/C03RKP2LW64/p1678116063310559?thread_ts=1678056372.557949&cid=C03RKP2LW64 Take them with a grain of salt, because I look into some of these things without time to prototype them (but, in my defense, I have spent more time than 99% of people thinking about deployment of Kedro to orchestrators :P). In general, I'd say the GetInData team is currently a better authority for deployment to workflow orchestrators than the official Kedro docs, and we hope to collaborate with them further to have better answers. (I believe, in our last conversation with them, they were looking to add support for mapping modular pipelines to tasks, but I could be mistaken and haven't checked their plugins lately to see if that is the case.)