Hi team did anyone use Kedro in a multi GPU training setup W Kedro #questions

Hi team, did anyone use Kedro in a multi-GPU train...

Jeremi DeBlois-Beaucage

06/13/2023, 4:32 PM

Hi team, did anyone use Kedro in a multi-GPU training setup? Would love to ask a few questions on how to best setup the repo. We are using Databricks and MLFlow, and are trying to assess whether Kedro can handle multi-GPU training in a straightforward way. Thanks!

Deepyaman Datta

06/13/2023, 5:56 PM

Have you talked to Shashwat Dalal re GPU setup? Probably the best person I can think of on the topic. I do recall from somebody that there were some challenges (maybe Stephen Simpson?), but just talk to Shashwat if you haven't. 😛

marrrcin

06/13/2023, 6:02 PM

Maybe you will find something useful here https://getindata.com/blog/deep-learning-with-azure-pytorch-distributed-training-done-right-kedro/

Nok Lam Chan

06/14/2023, 10:53 AM

Did you hit any particular issues? From my understanding Databricks and MLFlow doesn’t do anything specific about GPU, it provides the infrastructure with GPU that you can train deep learning model.

Nok Lam Chan

06/14/2023, 10:58 AM

There are two types of parallelization: • DataParallel - copy the same model to each GPU and do their thing • Model Parallel - Split one model across GPUs DataParallel are usually easier and straightforward to do, with library like

torch

it’s just a few lines of extra code. See https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html For model inference it could be quite different, something like Ray can be useful if you need to handle a large amount of requests. For batch inference I don’t think there is any difference to squeeze extra performance from there.

6 Views

Open in Slack

Previous Next