Did anyone get <https github com tunib ai parallelformers|Pa Kedro #questions

Did anyone get <ParralelFormers> to work with kedr...

Hugo Evers

10/23/2023, 8:24 AM

Did anyone get ParralelFormers to work with kedro? The reason im asking is because they recommend to run the transformer parralelisation in the main process (basically by they that: running code inside the context of

if __name__ == '__main__'

solves a lot of problems. So if you run have some problems about processes, try writing your code inside the context of it.). And indeed I ran into issues when running parralelformers on AWS batch on a p3.16xlarge instance with 8 gpus, so its running in kedro on a docker container.

datajoely

10/23/2023, 9:31 AM

I don’t know about this specifically - but if you are using Kedro’s

ParallelRunner

as well it may interfere

Hugo Evers

10/23/2023, 9:48 AM

No i was using

SequentialRunner

and finding it causes the issues that were mentioned

datajoely

10/23/2023, 9:51 AM

Okay - I’d like to understand this more

Hugo Evers

10/23/2023, 9:53 AM

Copy code

from transformers import TrainingArguments
    import torch

    # get the number of gpus
    num_gpus = torch.cuda.device_count()
    if num_gpus > 1:
        from parallelformers import parallelize

        parallelize(model, num_gpus=num_gpus, fp16=True, verbose="detail")

inside of a kedro node gives

Copy code

RuntimeError: Timed out initializing process group in store based barrier on rank: 7, for key: store_based_barrier_key:1 (world_size=8, worker_count=9, timeout=0:30:00) WARNING No nodes ran. Repeat the previous runner.py:213 command to attempt a new run. [10/15/23 12:57:26] ERROR Node 'sort_using_baal: node.py:356 func[redacted]) -> [redacted]' failed with error: Timed out initializing process group in store based barrier on rank: 7, for key: store_based_barrier_key:1 (world_size=8, worker_count=9, timeout=0:30:00)

## Environment python 3.10.1 parralelformers latest os: ubuntu

Juan Luis

10/23/2023, 11:32 AM

ugh, that looks really ugly. @Hugo Evers we'd love to have a look at this but it might be difficult without a reproducer. would you mind opening a GitHub issue? and the smaller the project is, the better

10 Views

Open in Slack

Previous Next