Another question from my side. I have a node whic...
# questions
p
Another question from my side. I have a node which outputs a dictionary called
train_test_dts
which I am saving as a pickle with the backend
joblib
. When I then try to run my pipeline with the parallel-runner like this:
Copy code
kedro run --pipeline feature_engineering --params env=dev,inference_dt=2025-01-05 --runner ParallelRunner
Then I am getting the following error:
Copy code
AttributeError: The following datasets cannot be used with multiprocessing: ['train_test_dts']
In order to utilize multiprocessing you need to make sure all datasets are serialisable, i.e. datasets should not make use of lambda functions, nested functions, closures etc.
If you are using custom decorators ensure they are correctly decorated using functools.wraps().
Any idea why that happens and what I could do to fix that?
h
Someone will reply to you shortly. In the meantime, this might help:
p
Solved. Apparently one cannot use input from GCS for these steps. Changed the path away from GCS to local folder, and it worked.
👍 1