Kedro is an open-sourced Python framework for creating maintainable and modular data science code.

Kedro

Screenshot 2023-08-31 at 8.59.17 AM.png

Hi all, I am trying to use --runner ParallelRunenr to speed up. If I run command without ParallelRunenr, the pipeline works well. If I run the following command,`kedro run --pipeline pipeline_mr --env local --runner ParallelRunner` . It will give me the following error.

Any ideas? Thank you! (Python version 3.9.0, kedro version: 0.18.8, pandas: 1.4.4)

is that dataset explicitly declared in the catlaog as a `MemoryDataSet` ?

That dataset was specified the same as other dataset in the catalog, and saved as parquet data.

image.png

It says that we’re hitting a deadlock in a `MemoryDataSet` - are any of these explicitly declared in your catalog?

No, I didn’t see anything explicitly defined as `MemoryDataSet`

Are you using any python libraries which do their own parallelisation i.e. distributed training etc?

No, in the failed nodes, it is mainly data processing using pandas