Kedro is an open-sourced Python framework for creating maintainable and modular data science code.

Kedro

Hello all,
I’m planning to run multiple instances of the same Kedro pipeline in parallel for inference tasks. I want to make sure that the intermediate files from these parallel runs don’t interfere with each other.
Could anyone provide advice or best practices for managing this? Are there specific configurations or precautions to ensure the intermediate files remain separate?
Thank you!

I would add an OmegaConf resolver with cache to generate unique ID per every run and put this ID into all paths (especially in the data catalog) that you feel should be made unique:
```OmegaConf.register_new_resolver(
   "run_id", lambda: uuid4().hex, use_cache=True
)```
Usage:
```companies:
  type: pandas.CSVDataset
  filepath: data/01_raw/${run_id:}/companies.csv

reviews:
  type: pandas.CSVDataset
  filepath: data/01_raw/${run_id:}/reviews.csv```

+ it also depends *how* will you paralleize it...
Is it external process?