Hi Kedro team...We ran a forecasting pipeline(runs...
# questions
b
Hi Kedro team...We ran a forecasting pipeline(runs 20 hours) having around 231nodes and it failed around 198 th node...now I want to run only the remaining failed 33 nodes , but then the error log/info logs doesn't provide a list of remaining nodes to rerun...Can you pls help on this...we definitely not want to run this forecasting pipeline again for 20 hours
d
Are any of the upstream nodes reading from all persisted inputs? I can't tell, because you seem to be using a very custom setup with some `IPythonXXDataSet`s and using
papermill
to execute. Can you share one of the custom dataset definitions? If they inherit from
MemoryDataSet
somehow, then Kedro will think you have no nodes reading from all persisted inputs.
b
Hey @Deepyaman Datta, The custom datasets created are inheriting AbstractDataSet. We know that is not the problem, as we experience the same issue when running on the command line and using CSV and Parquet datasets
d
Are you dynamically modifying your catalog?
I tried making a node fail in Spaceflights, and I'm getting the expected behavior, so I don't think it's something that's regressed across the board. So probably need a minimal example or some more context if going to investigate,
b
Yes, the catalog is being modified dynamically. The forecasting steps are happening for 21 different groups, so the catalog is updated on a with a hook to alter the datasets
d
If you look at the code for how to determine command to rerun, it depends on looking at the catalog to find ancestors nodes with persisted inputs. If you mess with the catalog (depending on what you do), it could be inaccurate.
👍 1
For reference: https://github.com/kedro-org/kedro/blob/fa8c56fa2e510e6a449f5ac7356f76c167be978a/kedro/runner/runner.py#L215-L280 So, if you want to debug this, since you can replicate in a local run, see what the state of catalog is + what persistent ancestors gives you with a small pipeline with an error introduced (and with whatever catalog-modifying hook you've written).
👍 2
🙏 1
j
hi folks, this is kind of a parallel conversation but I'm curious to know more about your setup. are you triggering Papermill from Kedro nodes to run Jupyter notebooks, or is it the other way around?
b
From Notebooks we have a Kedro session.run() triggers the pipeline run.
j
oh I see, so you use Papermill to launch a Notebook that triggers a Kedro pipeline ✔️
b
Thanks, @Deepyaman Datta @Logan Rupert @Rahul Agarwal for reference ^^
👍 1
@Dotun O ^^
👍 1