Hi < marrrcin> and Kedro Community I would like to ask about Kedro #plugins-integrations

Hi <@U045L91RV9D> and Kedro Community, I would li...

Muhammad Ghazalli

06/27/2023, 8:36 AM

Hi @marrrcin and Kedro Community, I would like to ask about kedro-azureml plugin, I have concerns about the Azure pipeline performance. 1. Why it takes more running time to finish the same pipeline if I'm comparing kedro run in azure pipeline and kedro run in my local machine.

marrrcin

06/27/2023, 8:39 AM

The answer is really simple - in Azure ML, every Kedro node is run on a separate container, potentially even on different virtual machines + there is a data serialization overhead between the nodes. Locally, you run everything on a single machine, with shared memory. The benefits of using Azure ML (or any other managed cloud ML toolkit like Vertex AI or SageMaker) start to emerge, once your pipelines & nodes are more CPU/memory intensive (e.g. larger data or larger models).

Muhammad Ghazalli

06/27/2023, 9:00 AM

Ooh, I see, with your experience do you have any recommendations regarding this? Is it possible to run the pipelines as a whole pipeline not by each node?

marrrcin

06/27/2023, 9:34 AM

I don’t get it 🤔

Muhammad Ghazalli

06/27/2023, 9:49 AM

I mean if I have pipeline A that consists of Node A, and Node B, in the Azure pipeline it will execute Node A and Node B separately, right? Is it possible to run a whole pipeline A in a single process? Because if I'm using 2 cores for example it only utilize each core for each process, if there is 4 job it will Queue and wait for available core.

marrrcin

06/27/2023, 10:58 AM

Yes, it is technically perfectly feasible to do this

🙏 1

Deepyaman Datta

06/27/2023, 12:15 PM

@Muhammad Ghazalli when you say "run a whole pipeline A in a single process", just to be clear on this example... does your full pipeline have multiple subpipelines, and you're looking to run each subpipeline in a separate process? Or you just want to run everything in one container?

Muhammad Ghazalli

06/28/2023, 12:13 AM

@Deepyaman Datta I want to run my pipelines in one container, just to speed things up. My pipelines really simple, it's only a pipeline with many nodes/steps.

Deepyaman Datta

06/28/2023, 1:21 PM

@Muhammad Ghazalli do you really need a plugin then? IMO the purpose of a deployment plugin like this is to do the more laborious conversion of nodes. I think, at some point in the future, it could be nice to have options to deploy at different levels (e.g. each modular pipeline in a container vs. each node as a separate container), but if it's just everything in one container, it's more about lifting-and-shifting your execution environment to that one container.

marrrcin

06/29/2023, 8:05 AM

@Deepyaman Datta we plan to introduce a concept of “groups” to the plugins, so that the nodes will not have to be necessarily mapped 1:1

🙌 2

Muhammad Ghazalli

06/29/2023, 10:38 AM

@Deepyaman Datta That's true, in my case, I want to move my Kedro from AKS into AML Workspace and utilize Azure Pipelines, all work like a charm in AKS with 1 node (4 cores 16GB Memory). If I want to migrate it into AML I need to calculate the cost also, for now, if using more compute power it is not feasible to move it into AML. You are right it doesn't make any sense to move all Kedro Node into one container (what's the difference if I deployed it in AKS, right?), but I need to find all possible ways to experiment with pros and cons. @marrrcin great idea! can't wait.

👍 1

Deepyaman Datta

06/29/2023, 12:56 PM

@Deepyaman Datta we plan to introduce a concept of “groups” to the plugins, so that the nodes will not have to be necessarily mapped 1:1

I always thought it made sense to map modular pipelines to containers, but, when you mention this, maybe it's better and more flexible to be able to define groups. After all, just because you want to modify which container some logic gets deployed to shouldn't mean you need to restructure your pipeline. Also, there are a lot of challenges with separating a Kedro pipeline into modular pipelines programmatically after-the-fact; defining groups sounds a lot more technically reasonable. Looking forward to seeing it, too!

16 Views

Open in Slack

Previous Next