has anyone got the case where using ThreadPoolExecutor in a Kedro #questions

has anyone got the case where using ThreadPoolExec...

Gauthier Pierard

03/20/2025, 9:11 AM

has anyone got the case where using ThreadPoolExecutor in a node (running queries in parallel for example) causes the kedro process to hang and not terminate after execution ?

Hall

03/20/2025, 9:11 AM

Someone will reply to you shortly. In the meantime, this might help:

datajoely

03/20/2025, 9:13 AM

So Kedro has several node level runners • parallel runner uses multiprocessing • Thread runner uses threads (designed for external execution engines like spark / ibis) If you do parallelism within a node itself you have to use the sequential runner as they will conflict with the two above

Gauthier Pierard

03/20/2025, 9:13 AM

I am using

Copy code

with ThreadPoolExecutor(max_workers=10) as executor:  
        future_to_query = {executor.submit(execute_single_query, q_name, q_value): q_name for q_name, q_value in queries.items()}
        for future in as_completed(future_to_query):
            query_name, data = future.result()
            if data is not None:
                results[query_name] = data
    executor.shutdown(wait=False)  # Ensure all threads are cleaned up
    del executor

and as you can see the kedro process doesnt complete. when I don't use threadpoolexecutor, this doesnt happen.

datajoely

03/20/2025, 9:14 AM

What are you executing your queries against?

Gauthier Pierard

03/20/2025, 9:14 AM

a denodo db using jdbc

datajoely

03/20/2025, 9:15 AM

Got it

datajoely

03/20/2025, 9:15 AM

In general this is already outside of the pattern that Kedro encourages with i/o living within the catalog

datajoely

03/20/2025, 9:16 AM

I need to think what’s the best option here

datajoely

03/20/2025, 9:17 AM

Are you just looking to pull data out of denodo and dump it somewhere for processing?

Gauthier Pierard

03/20/2025, 9:17 AM

I got the same problem with sequential

Gauthier Pierard

03/20/2025, 9:17 AM

pretty much yes

datajoely

03/20/2025, 9:17 AM

I’m tempted to say you’re getting none of the benefits of Kedro doing this step here

datajoely

03/20/2025, 9:18 AM

Once the data is out you’re in a great place to transform it with Kedro

datajoely

03/20/2025, 9:18 AM

But you’re going to be banging your head against the wall trying to do this within a node itself think

Gauthier Pierard

03/20/2025, 9:19 AM

it works perfectly fine without multithreading, just gotta find out why kedro hangs

Gauthier Pierard

03/20/2025, 9:21 AM

the pipeline aggregates these results as a catalog entry

datajoely

03/20/2025, 9:22 AM

Which version of Kedro are you in?

datajoely

03/20/2025, 9:22 AM

There was a regression recently that may have introduced this

Gauthier Pierard

03/20/2025, 9:22 AM

kedro, version 0.19.10

datajoely

03/20/2025, 9:22 AM

Can you try 0.19.12 it should be safe to upgrade with no changes

Gauthier Pierard

03/20/2025, 9:23 AM

sure

Gauthier Pierard

03/20/2025, 9:28 AM

will probably be available tomorrow in my org's internal repo. from the release notes this seems promising. thx for the tip Changed the execution of

SequentialRunner

to not use an executor pool to ensure it's single threaded.

Gauthier Pierard

03/20/2025, 10:46 AM

okay, same issue on 0.19.12

datajoely

03/20/2025, 12:01 PM

hmm

datajoely

03/20/2025, 12:01 PM

@Merel who's the wizard this week?

Merel

03/20/2025, 12:02 PM

@Rashida Kanchwala 🙂

🙏 1

Rashida Kanchwala

03/20/2025, 12:19 PM

@Gauthier Pierard, do you mind raising this is as a github issue and we will look into it.

Gauthier Pierard

03/20/2025, 12:19 PM

sure. thanks

Gauthier Pierard

03/20/2025, 1:03 PM

doing the same with joblib solves the issue:

Copy code

from joblib import Parallel, delayed
# ...
results = Parallel(n_jobs=nb_executors)(
        delayed(execute_single_query)(q_name, q_value, input_url, denodo_user, denodo_password, driverpath, namespace)
        for q_name, q_value in queries.items()
    )

❤️ 2

datajoely

03/20/2025, 1:03 PM

Simpler code too

3 Views

Open in Slack

Previous Next