Hello all, I’m planning to run multiple instances ...
# questions
b
Hello all, I’m planning to run multiple instances of the same Kedro pipeline in parallel for inference tasks. I want to make sure that the intermediate files from these parallel runs don’t interfere with each other. Could anyone provide advice or best practices for managing this? Are there specific configurations or precautions to ensure the intermediate files remain separate? Thank you!
m
I would add an OmegaConf resolver with cache to generate unique ID per every run and put this ID into all paths (especially in the data catalog) that you feel should be made unique:
Copy code
OmegaConf.register_new_resolver(
   "run_id", lambda: uuid4().hex, use_cache=True
)
Usage:
Copy code
companies:
  type: pandas.CSVDataset
  filepath: data/01_raw/${run_id:}/companies.csv

reviews:
  type: pandas.CSVDataset
  filepath: data/01_raw/${run_id:}/reviews.csv
👍 1
+ it also depends how will you paralleize it... Is it external process?