Hi fellows I am running a `kedro` project on Databricks and Kedro #questions

Hi fellows, I am running a `kedro` project on Data...

Flavien

05/03/2023, 10:26 AM

Hi fellows, I am running a

kedro

project on Databricks (and have good hope to convince my team to go for

kedro

). The documentation is very well written, thanks for that. Scrolling through the messages in Slack, I did not find a way to directly use the object

spark

, the

SparkSession

provided directly in the Databricks notebooks. Is there any way to do so?

❤️ 5

Juan Luis

05/03/2023, 10:31 AM

hi @Flavien, thanks for the shoutout! props to @Jannic Holzer and @Jo Stichbury for those pages, they took a lot of effort to write 🙌🏼

Juan Luis

05/03/2023, 10:32 AM

about using the

SparkSession

directly, let me check

Juan Luis

05/03/2023, 10:34 AM

were you thinking of accessing it from within a node?

Flavien

05/03/2023, 10:39 AM

No, I had in mind its "implicit" usage in the data sets. I am using a

SparkHiveDataSet

which is getting the session through

SparkSession.builder.getOrCreate()

. I am not familiar at all with the inner workings of Spark and Databricks. As Databricks provided a dedicated session (properly) setup, I would have liked it to be used when running:

Copy code

with KedroSession.create() as session:
    session.run(pipeline_name="mon_petit_pipeline")

as per the documentation.

👀 1

Flavien

05/03/2023, 10:41 AM

I tried to "trick" the data set by loading the object with a

locals().get("spark")

without success.

Juan Luis

05/03/2023, 12:12 PM

@Flavien I see two ways to access the `SparkSession`: • the

SparkHiveDataSet

has a static

_get_spark

method that, as you spotted, gets or creates an existing one. if I understand correctly how Databricks works, this should get the appropriate

SparkSession

, even if it's created by Databricks. • when passing a

DataFrame

around, you can access its

.sparkSession

property, which yields the session that created it. I'm mindful these answers are somewhat generic (I promised I generated them with my own brain and not ChatGPT 😬) but, if none of them are quite what you were looking for, would you mind sharing a bit more detail?

Flavien

05/03/2023, 12:26 PM

Thanks @Juan Luis. Checking the sessions, they indeed seem to be identical. One step closer to use

kedro

in my team — mentioning

kedro

every week for a year now 😅.

🎉 2

Juan Luis

05/03/2023, 12:53 PM

awesome! feel free to keep dropping #C03RKP2LW64 💪🏼

Open in Slack

Previous Next