Hi Folks I ve been using the `kedro azureml` plugin and I se Kedro #plugins-integrations

Hi Folks I've been using the `kedro-azureml` plug...

Afaque Ahmad

02/20/2024, 10:53 AM

Hi Folks I've been using the

kedro-azureml

plugin and I see that there are performance differences when I run the pipeline the pipeline locally vs on the azureml cluster. I understand that each node spins up a container and this may contribute to additional overhead. Is there any way to make the pipelines run faster (without throwing more resources). Any optimizations that could be possibly make?

marrrcin

02/20/2024, 2:18 PM

This will happen with any container-level orchestration tool that you will use with Kedro, no matter whether its Airflow, Prefect, SageMaker, Vertex AI or Azure - since in those tools usually there is a 1-to-1 mapping of Kedro nodes - steps in the engine, there will always be non-0 overhead. To mitigate that, we’ve introduced node-grouping feature in Vertex AI that allows to run more than 1 Kedro node in a single “step” in orchestrator. It can be ported to Kedro-AzureML (see https://github.com/getindata/kedro-azureml/issues/84 ), if you’re willing to contribute, we can guide you in implementing that.

Cody Peterson

02/20/2024, 3:12 PM

you can check in the azure ml logs how long it took for the container to startup. one way to mitigate startup times is to leave the cluster (or a compute instance) always on and run on that, but costs more (I used to work on azure ml)

❤️ 3

Artur Dobrogowski

02/26/2024, 12:31 PM

Optimisations: possibly node clustering to schedule them together in a single container. Not sure if AzureML has adopted this feature yet.

Afaque Ahmad

02/28/2024, 6:22 AM

So I made the following changes: 1. Moved to a 2 node cluster (earlier it was 1) - to check if it runs independent kedro nodes on different nodes of the cluster. (Is this by default going to work?) 2. Kept the cluster alive => already provisioned nodes I see that the overall time went down from 10m -> 4m, which is good as a result of the 2 changes above. But there's a good amount of time being lost in the "In queue" step. Is there any way to reduce this too?

Cody Peterson

02/28/2024, 1:28 PM

I don't think so, I think the queue step is the Azure ML computer scheduler thingy just being slow. not sure there's anything you can do to reduce it. may be worth trying to get in touch w/ them if it's still an issue (they're pretty responsive to the smiley/frowny feedback thing in the top right of the GUI, assuming that's still there)

👍 1

Afaque Ahmad

02/29/2024, 2:21 AM

I also see with large size of data, the job fails with an out of memory error. I could change the

cpu

and

memory

of the containers if using the

azureml

sdk as below:

Copy code

# Install azureml-core package first: pip install azureml-core

from azureml.core import RunConfiguration, Experiment, Workspace, ScriptRunConfig, Environment
from azureml.core.runconfig import DockerConfiguration

workspace = Workspace("<SUBSCRIPTION_ID>", "<RESOURCE_GROUP_NAME>", "<AZURE_ML_WORKSPACE_NAME>")

...

cluster = workspace.compute_targets['<COMPUTE_CLUSTER_NAME>']
run_config = RunConfiguration()
# Define the number of CPU cores and the amount of memory to be used by the Docker container instance. 
run_config.docker = DockerConfiguration(use_docker=True, arguments=["--cpus=16", "--memory=128g"], shm_size="64M")

Given, I'm using the plugin, I cannot find a way to be able to do / change something like the above:

DockerConfiguration(use_docker=True, arguments=["--cpus=16", "--memory=128g"]

Is there any way the docker configs can be changed?

Cody Peterson

02/29/2024, 2:53 AM

I’m not familiar with the kedro plugin so I don’t know — could you open an issue to ask the maintainers?

👍 1

Afaque Ahmad

02/29/2024, 3:39 AM

Can do that. Also checking if @marrrcin would know?

9 Views

Open in Slack

Previous Next