Hi < Jens Peder Meldgaard> I m learning more about how `kedr Kedro #plugins-integrations

Hi <@U0769K3GSD9>, I'm learning more about how `ke...

Merel

03/27/2025, 8:31 AM

Hi @Jens Peder Meldgaard, I'm learning more about how

kedro-databricks

works and I was wondering whether it makes sense to use any of the other runners (

ThreadRunner

ParallelRunner

)? As far as I understand for every node we use these run parameters

--nodes name, --conf-source self.remote_conf_dir, --env self.env

. Would it make sense to allow for adding runner type too? Or if you want parallel running you should use the databricks cluster setup for that? I'm not very familiar with all the run options in Databricks, so trying to figure out where to use Kedro features and where Databricks. (cc: @Rashida Kanchwala)

Merel

03/27/2025, 9:02 AM

I guess I partly answered my own question, since it doesn't make sene to provide the runner argument if each node is run individually per task. But you could of course do your grouping differently and run a whole namespace or pipeline in a task, would it then make sense to run that part with either the

ThreadRunner

ParallelRunner

Deepyaman Datta

03/27/2025, 1:45 PM

Without looking into

kedro-databricks

, but based on experience working with Spark, I would expect you can't use

ParallelRunner

Merel

03/27/2025, 2:00 PM

hmm yeah good point about spark and the ParallelRunner

Jens Peder Meldgaard

03/28/2025, 2:34 PM

Hey @Merel, The idea of

kedro-databricks

is rather to generate the DAGs of kedro pipelines as a Databricks Workflows. Any type of parallelisation should therefore be implemented on the node-level, if used with

kedro-databricks

. If tasks can run in parallel, based on the DAG, they will run in parallel by default.

👍 2

Merel

03/28/2025, 4:21 PM

If tasks can run in parallel, based on the DAG, they will run in parallel by default.

Is that a default setting on Databricks? I haven't gone beyond the basic example with 3 nodes yet, but will do some more experimenting next week.

Jens Peder Meldgaard

03/28/2025, 4:22 PM

Yes. The DAG is executed 100% based on the dependencies between tasks, so it should be parallel, where possible by default

Merel

03/28/2025, 4:23 PM

Ah great to know! @Rashida Kanchwala ☝️

Open in Slack

Previous Next