Alexis Drakopoulos
09/02/2024, 9:04 AMget_spark() -> SparkSession: ...
I have in my catalog:
spark_session:
type: MemoryDataset
copy_mode: assign
then my nodes are node(func=get_spark, outputs="spark_session")
and I get this error:
[CONTEXT_ONLY_VALID_ON_DRIVER] It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.
Is there another way to pass the session around to make it available to my nodes? Maybe I should be doing this in hooks?
edit: I found https://docs.kedro.org/en/stable/integrations/pyspark_integration.html#initialise-a-sparksession-using-a-hook might just do thisMerel
09/02/2024, 4:18 PM