Flavien
10/17/2023, 7:42 AMIñigo Hidalgo
10/17/2023, 8:32 AMFlavien
10/17/2023, 9:04 AMkedro
is a new tool for the team and a node seemed to be the smallest unit to be reused. I think that we need to find the right spot between a large pipeline with everything and every single node to be a pipeline. In my opinion, a pipeline, if it needs to be reused, should encapsulate some kind of business or implementation logic. But it is a struggle. 😅Iñigo Hidalgo
10/17/2023, 9:21 AMFlavien
10/17/2023, 9:25 AMNok Lam Chan
10/17/2023, 9:37 AMIñigo Hidalgo
10/17/2023, 9:49 AMNok Lam Chan
10/17/2023, 10:26 AMhave a common local library with utilities functions and create a node on each pipeline which uses the dedicated function;It may help to shift the focus of the conversation towards these dimensions rather than how big a pipeline/node should be • Reusability • Easy to test • Performance (more opportunity to parallelise in node level) • Easy to debug (i.e. if something goes wrong in the node, how would you debug? Is there some checkpoint you can immediately do
catalog.load
to recover the data or you have to rerun a 3 hour pipeline?)Flavien
10/18/2023, 7:25 AM