https://kedro.org/ logo
#questions
Title
# questions
q

Qiuyi Chen

08/31/2023, 1:32 PM
Hi all, I am trying to use --runner ParallelRunenr to speed up. If I run command without ParallelRunenr, the pipeline works well. If I run the following command,
kedro run --pipeline pipeline_mr --env local --runner ParallelRunner
. It will give me the following error. Any ideas? Thank you! (Python version 3.9.0, kedro version: 0.18.8, pandas: 1.4.4)
d

datajoely

08/31/2023, 1:38 PM
is that dataset explicitly declared in the catlaog as a
MemoryDataSet
?
q

Qiuyi Chen

08/31/2023, 1:40 PM
That dataset was specified the same as other dataset in the catalog, and saved as parquet data.
d

datajoely

08/31/2023, 1:42 PM
It says that we’re hitting a deadlock in a
MemoryDataSet
- are any of these explicitly declared in your catalog?
q

Qiuyi Chen

08/31/2023, 1:54 PM
No, I didn’t see anything explicitly defined as
MemoryDataSet
d

datajoely

08/31/2023, 1:54 PM
Are you using any python libraries which do their own parallelisation i.e. distributed training etc?
especially in the node that fails
q

Qiuyi Chen

08/31/2023, 1:56 PM
No, in the failed nodes, it is mainly data processing using pandas