Hi team, did anyone use Kedro in a multi-GPU train...
# questions
j
Hi team, did anyone use Kedro in a multi-GPU training setup? Would love to ask a few questions on how to best setup the repo. We are using Databricks and MLFlow, and are trying to assess whether Kedro can handle multi-GPU training in a straightforward way. Thanks!
d
Have you talked to Shashwat Dalal re GPU setup? Probably the best person I can think of on the topic. I do recall from somebody that there were some challenges (maybe Stephen Simpson?), but just talk to Shashwat if you haven't. 😛
m
n
Did you hit any particular issues? From my understanding Databricks and MLFlow doesn’t do anything specific about GPU, it provides the infrastructure with GPU that you can train deep learning model.
There are two types of parallelization: • DataParallel - copy the same model to each GPU and do their thing • Model Parallel - Split one model across GPUs DataParallel are usually easier and straightforward to do, with library like
torch
it’s just a few lines of extra code. See https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html For model inference it could be quite different, something like Ray can be useful if you need to handle a large amount of requests. For batch inference I don’t think there is any difference to squeeze extra performance from there.