Constantin Jukowski
11/03/2023, 8:31 PMNok Lam Chan
11/04/2023, 4:43 AMConstantin Jukowski
11/04/2023, 10:32 AMbefore_pipeline_run
- does not seem to be able to influence the running process either.Nok Lam Chan
11/04/2023, 11:38 AMFlorian d
11/06/2023, 9:17 AMdatajoely
11/06/2023, 9:34 AMNok Lam Chan
11/06/2023, 11:14 AMPipeline
API already.Florian d
11/06/2023, 11:16 AMDavid Stanley
11/06/2023, 12:17 PMNok Lam Chan
11/06/2023, 12:41 PMDavid Stanley
11/06/2023, 1:23 PMWhat would be the definition of checking if the dataset exist and in a way that guarantee it won't accidentally run on outdated data?No guarantee - not bothered about running on outdated data in that use-case (for my actual use-case, it was not possible for raw to change or if changes to existing pipeline also then errors would flag). Perhaps would just check if a file exists at the catalog entry filepath, that would do as a starting point functionality-wise, I should think.
The feature kinda exist as when you fail the pipeline, it will produce a log that suggest how you can recover from the pipeline.Although one improvement that would be nice is to make that more directly copy-pasteable. Recent experience has been that it does not copy-paste well into terminal, meaning have to manually go and fix all the lines.
Nok Lam Chan
11/06/2023, 1:39 PMNo guarantee - not bothered about running on outdated data in that use-case (for my actual use-case, it was not possible for raw to change or if changes to existing pipeline also then errors would flag). Perhaps would just check if a file exists at the catalog entry filepath, that would do as a starting point functionality-wise, I should think.Yep, that would be a good head start. Let me know if you ever started working on this or need some extra help.
David Stanley
11/06/2023, 2:03 PMYep, that would be a good head start. Let me know if you ever started working on this or need some extra help.Not started working yet sorry, bit busy with things, will let you know if/when I do though. My initial thinking is to wrap the node functions, use their input and output catalog entries, check inputs against set of reran node outputs (initially empty), if any match add outputs to that set and run node as normal, else then check whether file exists at output, if so then skip the node else run node as normal.
Valid comment, I think I have encountered this but haven’t heard too many complaints. I created an issue https://github.com/kedro-org/kedro/issues/3276, let see if this is a more common problem and we can prioritise it.Awesome, thanks.