https://kedro.org/ logo
#questions
Title
# questions
m

Michal Szlupowicz

03/14/2023, 11:57 AM
Hi guys. Im trying to load data from SnowFlake to SparkDataSet using data catalog. We thought that use of SparkJDBCDataSet is the proper way of doing that but I struggle to set up connection drivers. Could someone advise me how ot set it up or suggest other solution?
d

datajoely

03/14/2023, 12:14 PM
so we’ve not cut a release yet, but you can actually use the recently contributed SnowParkDataset if you want as a custom dataset https://github.com/kedro-org/kedro-plugins/blob/main/kedro-datasets/kedro_datasets/snowflake/snowpark_dataset.py • copy and paste it into your project as .py file • ensure the
"snowflake-snowpark-python~=1.0.0", "pyarrow~=8.0"
dependencies are installed • reference the class path in your catalog
👍 1
it will be release shortly too if you want it natively
m

Michal Szlupowicz

03/14/2023, 12:17 PM
Let me try that
y

Yetunde

03/14/2023, 12:34 PM
And you'll need Python 3.8 for this to work,
snowflake-snowpark-python~=1.0.0
does not support later versions of Python.
👍 1
m

Michal Szlupowicz

03/14/2023, 5:28 PM
Hey. So SnowParkDataset loads me a Snowpark.Table. How can i transform it to pyspark.dataframe?
d

datajoely

03/14/2023, 5:40 PM
Ah snowpark is their competitor of Spark so they’re not super compatible
6 Views