Hi all, I am trying to use --runner ParallelRunenr...
# questions
q
Hi all, I am trying to use --runner ParallelRunenr to speed up. If I run command without ParallelRunenr, the pipeline works well. If I run the following command,
kedro run --pipeline pipeline_mr --env local --runner ParallelRunner
. It will give me the following error. Any ideas? Thank you! (Python version 3.9.0, kedro version: 0.18.8, pandas: 1.4.4)
d
is that dataset explicitly declared in the catlaog as a
MemoryDataSet
?
q
That dataset was specified the same as other dataset in the catalog, and saved as parquet data.
d
It says that we’re hitting a deadlock in a
MemoryDataSet
- are any of these explicitly declared in your catalog?
q
No, I didn’t see anything explicitly defined as
MemoryDataSet
d
Are you using any python libraries which do their own parallelisation i.e. distributed training etc?
especially in the node that fails
q
No, in the failed nodes, it is mainly data processing using pandas