Hi Folks I've been using the `kedro-azureml` plug...
# plugins-integrations
a
Hi Folks I've been using the
kedro-azureml
plugin and I see that there are performance differences when I run the pipeline the pipeline locally vs on the azureml cluster. I understand that each node spins up a container and this may contribute to additional overhead. Is there any way to make the pipelines run faster (without throwing more resources). Any optimizations that could be possibly make?
m
This will happen with any container-level orchestration tool that you will use with Kedro, no matter whether its Airflow, Prefect, SageMaker, Vertex AI or Azure - since in those tools usually there is a 1-to-1 mapping of Kedro nodes - steps in the engine, there will always be non-0 overhead. To mitigate that, we’ve introduced node-grouping feature in Vertex AI that allows to run more than 1 Kedro node in a single “step” in orchestrator. It can be ported to Kedro-AzureML (see https://github.com/getindata/kedro-azureml/issues/84 ), if you’re willing to contribute, we can guide you in implementing that.
c
you can check in the azure ml logs how long it took for the container to startup. one way to mitigate startup times is to leave the cluster (or a compute instance) always on and run on that, but costs more (I used to work on azure ml)
❤️ 3
a
Optimisations: possibly node clustering to schedule them together in a single container. Not sure if AzureML has adopted this feature yet.
a
So I made the following changes: 1. Moved to a 2 node cluster (earlier it was 1) - to check if it runs independent kedro nodes on different nodes of the cluster. (Is this by default going to work?) 2. Kept the cluster alive => already provisioned nodes I see that the overall time went down from 10m -> 4m, which is good as a result of the 2 changes above. But there's a good amount of time being lost in the "In queue" step. Is there any way to reduce this too?
c
I don't think so, I think the queue step is the Azure ML computer scheduler thingy just being slow. not sure there's anything you can do to reduce it. may be worth trying to get in touch w/ them if it's still an issue (they're pretty responsive to the smiley/frowny feedback thing in the top right of the GUI, assuming that's still there)
👍 1
a
I also see with large size of data, the job fails with an out of memory error. I could change the
cpu
and
memory
of the containers if using the
azureml
sdk as below:
Copy code
# Install azureml-core package first: pip install azureml-core

from azureml.core import RunConfiguration, Experiment, Workspace, ScriptRunConfig, Environment
from azureml.core.runconfig import DockerConfiguration

workspace = Workspace("<SUBSCRIPTION_ID>", "<RESOURCE_GROUP_NAME>", "<AZURE_ML_WORKSPACE_NAME>")

...

cluster = workspace.compute_targets['<COMPUTE_CLUSTER_NAME>']
run_config = RunConfiguration()
# Define the number of CPU cores and the amount of memory to be used by the Docker container instance. 
run_config.docker = DockerConfiguration(use_docker=True, arguments=["--cpus=16", "--memory=128g"], shm_size="64M")
Given, I'm using the plugin, I cannot find a way to be able to do / change something like the above:
DockerConfiguration(use_docker=True, arguments=["--cpus=16", "--memory=128g"]
Is there any way the docker configs can be changed?
c
I’m not familiar with the kedro plugin so I don’t know — could you open an issue to ask the maintainers?
👍 1
a
Can do that. Also checking if @marrrcin would know?