https://kedro.org/ logo
#questions
Title
# questions
r

Rachid Cherqaoui

07/18/2023, 3:00 PM
How can I reduce the execution time of a kedro project? Is there anything I should be looking at?
m

Marc Gris

07/18/2023, 3:01 PM
Have you tried the ParallelRunner ? šŸ™‚
r

Rachid Cherqaoui

07/18/2023, 3:02 PM
in fact my different nodes execute sequentially so I won't be able to use the ParallelRunner
j

Juan Luis

07/18/2023, 3:05 PM
what do your nodes do, generally speaking? process tabular data, connect to databases, something else?
n

Nok Lam Chan

07/18/2023, 3:18 PM
Many ideas! • pip install pandas[performance] • --async / --parallel • CachedDataSet • PartitionedDataSet - Lazy loading/saving • yield node to process data in chunk - Add an example in the documentation about nodes with generator functions kedro#2170 https://github.com/kedro-org/kedro-devrel/issues/49#issuecomment-1473735750
šŸ‘šŸ¼ 1
šŸ‘ 1
Btw, you should profile your pipeline to find out the bottleneck first. https://github.com/joerick/pyinstrument
šŸ‘ 1
šŸ‘šŸ¼ 1
r

Rachid Cherqaoui

07/18/2023, 3:43 PM
@Juan Luis I have 5 nodes that are connected to each other by manipulating the dataframe and loading and saving the various results.
j

Juan Luis

07/18/2023, 3:49 PM
@Rachid Cherqaoui good to know - are you using pandas, PySpark, or something else?
r

Rachid Cherqaoui

07/18/2023, 3:50 PM
pandas, xgboost and fastAPi
j

Juan Luis

07/18/2023, 3:54 PM
have a look at https://pythonspeed.com/datascience/#pandas or, alternatively, switch to https://www.pola.rs/
šŸ‘ 1
r

Rachid Cherqaoui

07/18/2023, 3:55 PM
Thank you
d

Deepyaman Datta

07/18/2023, 6:16 PM
Are you loading and saving all your datasets to physical catalog entries? You will incur an I/O bottleneck writing to disk, especially if they're large files. In addition to what everybody mentioned, I'd say that performance issues are usually not Kedro-related; Kedro is pretty smart around not adding overhead to the underlying calls. (There are occasional issues caused due to something in Kedro; profiling usually finds that, then we have to fix it.)