Hello Everyone, I have a question about memory man...
# questions
j
Hello Everyone, I have a question about memory management while using Kedro. I have a kedro project that consists of 2 pipelines (data_processing_pipeline & ML_pipeline). My data processing is done using Spark that gets initialized with Kedro hooks. At the end of my data_processing pipeline the results are written to a SparkDataset to disk. Now, my issue is when I execute a kedro run and kedro is now done with the data_processing pipeline and is executing the ML pipeline the Spark session is still holding on to the memory it utilized during the processing. I know this because 20 minutes into the ML portion I can kill the Spark worker with the Spark UI and this releases a significant amount of memory. My question is this How do I tell kedro to release objects that are no longer needed (the dataset is not used beyond the data_processing step) from memory?
m
I think the most clean way to do this is to split the project into 2 kedro projects; a Spark and non-Spark one. Alternatively, you can use an after node run hook to stop the spark session when the last node requiring spark is completed
thankyou 2
j
@Matthias Roels thank you for your response. I didn’t think about a hook to stop the spark session. I think I will give that a try!
n
> Now, my issue is when I execute a kedro run and kedro is now done with the data_processing pipeline and is executing the ML pipeline the Spark session is still holding on to the memory it utilized during the processing. I know this because 20 minutes into the ML portion I can kill the Spark worker with the Spark UI and this releases a significant amount of memory. Does Spark session holds memory as long as the session is still alive? > My question is this How do I tell kedro to release objects that are no longer needed (the dataset is not used beyond the data_processing step) from memory? Kedro node normally does not hold unnecessary object. As long as there are no reference this is left to Python garbage collection to clean up the reference.