Flavien
05/03/2023, 10:26 AMkedro
project on Databricks (and have good hope to convince my team to go for kedro
). The documentation is very well written, thanks for that. Scrolling through the messages in Slack, I did not find a way to directly use the object spark
, the SparkSession
provided directly in the Databricks notebooks. Is there any way to do so?Juan Luis
05/03/2023, 10:31 AMSparkSession
directly, let me checkFlavien
05/03/2023, 10:39 AMSparkHiveDataSet
which is getting the session through SparkSession.builder.getOrCreate()
.
I am not familiar at all with the inner workings of Spark and Databricks. As Databricks provided a dedicated session (properly) setup, I would have liked it to be used when running:
with KedroSession.create() as session:
session.run(pipeline_name="mon_petit_pipeline")
as per the documentation.locals().get("spark")
without success.Juan Luis
05/03/2023, 12:12 PMSparkHiveDataSet
has a static _get_spark
method that, as you spotted, gets or creates an existing one. if I understand correctly how Databricks works, this should get the appropriate SparkSession
, even if it's created by Databricks.
⢠when passing a DataFrame
around, you can access its .sparkSession
property, which yields the session that created it.
I'm mindful these answers are somewhat generic (I promised I generated them with my own brain and not ChatGPT š¬) but, if none of them are quite what you were looking for, would you mind sharing a bit more detail?Flavien
05/03/2023, 12:26 PMkedro
in my team ā mentioning kedro
every week for a year now š
.Juan Luis
05/03/2023, 12:53 PM