Flavien
05/03/2023, 10:26 AMkedro project on Databricks (and have good hope to convince my team to go for kedro). The documentation is very well written, thanks for that. Scrolling through the messages in Slack, I did not find a way to directly use the object spark, the SparkSession provided directly in the Databricks notebooks. Is there any way to do so?Juan Luis
05/03/2023, 10:31 AMJuan Luis
05/03/2023, 10:32 AMSparkSession directly, let me checkJuan Luis
05/03/2023, 10:34 AMFlavien
05/03/2023, 10:39 AMSparkHiveDataSet which is getting the session through SparkSession.builder.getOrCreate() .
I am not familiar at all with the inner workings of Spark and Databricks. As Databricks provided a dedicated session (properly) setup, I would have liked it to be used when running:
with KedroSession.create() as session:
session.run(pipeline_name="mon_petit_pipeline")
as per the documentation.Flavien
05/03/2023, 10:41 AMlocals().get("spark") without success.Juan Luis
05/03/2023, 12:12 PMSparkHiveDataSet has a static _get_spark method that, as you spotted, gets or creates an existing one. if I understand correctly how Databricks works, this should get the appropriate SparkSession, even if it's created by Databricks.
⢠when passing a DataFrame around, you can access its .sparkSession property, which yields the session that created it.
I'm mindful these answers are somewhat generic (I promised I generated them with my own brain and not ChatGPT š¬) but, if none of them are quite what you were looking for, would you mind sharing a bit more detail?Flavien
05/03/2023, 12:26 PMkedro in my team ā mentioning kedro every week for a year now š
.Juan Luis
05/03/2023, 12:53 PM