How can I reduce the execution time of a kedro project Is th Kedro #questions

Join Slack

How can I reduce the execution time of a kedro pro...

# questions

Rachid Cherqaoui

07/18/2023, 3:00 PM

How can I reduce the execution time of a kedro project? Is there anything I should be looking at?

Marc Gris

07/18/2023, 3:01 PM

Have you tried the ParallelRunner ? 🙂

Rachid Cherqaoui

07/18/2023, 3:02 PM

in fact my different nodes execute sequentially so I won't be able to use the ParallelRunner

Juan Luis

07/18/2023, 3:05 PM

what do your nodes do, generally speaking? process tabular data, connect to databases, something else?

Nok Lam Chan

07/18/2023, 3:18 PM

Many ideas! • pip install pandas[performance] • --async / --parallel • CachedDataSet • PartitionedDataSet - Lazy loading/saving • yield node to process data in chunk - Add an example in the documentation about nodes with generator functions kedro#2170 https://github.com/kedro-org/kedro-devrel/issues/49#issuecomment-1473735750

👍🏼 1

👍 1

Nok Lam Chan

07/18/2023, 3:19 PM

Btw, you should profile your pipeline to find out the bottleneck first. https://github.com/joerick/pyinstrument

👍 1

👍🏼 1

Rachid Cherqaoui

07/18/2023, 3:43 PM

@Juan Luis I have 5 nodes that are connected to each other by manipulating the dataframe and loading and saving the various results.

Juan Luis

07/18/2023, 3:49 PM

@Rachid Cherqaoui good to know - are you using pandas, PySpark, or something else?

Rachid Cherqaoui

07/18/2023, 3:50 PM

pandas, xgboost and fastAPi

Juan Luis

07/18/2023, 3:54 PM

have a look at https://pythonspeed.com/datascience/#pandas or, alternatively, switch to https://www.pola.rs/

👍 1

Rachid Cherqaoui

07/18/2023, 3:55 PM

Thank you

Deepyaman Datta

07/18/2023, 6:16 PM

Are you loading and saving all your datasets to physical catalog entries? You will incur an I/O bottleneck writing to disk, especially if they're large files. In addition to what everybody mentioned, I'd say that performance issues are usually not Kedro-related; Kedro is pretty smart around not adding overhead to the underlying calls. (There are occasional issues caused due to something in Kedro; profiling usually finds that, then we have to fix it.)

29 Views

Open in Slack

Previous Next