Hello everyone I m trying to use <https github com kedro org Kedro #questions

Hello everyone! I'm trying to use <SparkDataset> t...

Júlio Resende

09/15/2025, 5:00 PM

Hello everyone! I'm trying to use SparkDataset to read and write to the Azure Datalake File System, using the abfs:// prefix. I noticed that, although the dataset requires credentials to be passed in the init method, these credentials are not used when writing, requiring the Spark section to be configured globally. This seems a bit out of line with the Kedro standard, as it doesn't allow us to have datasets from multiple sources. Shouldn't we be using these credentials directly when writing and reading, without using the global Spark configuration?

Elena Khaustova

09/16/2025, 9:40 AM

Hi @Júlio Resende, as far as I understand: • When you use

abfs://

abfss://

, Spark defers to the Azure Hadoop connector to handle authentication. • That connector ignores per-writer

.options(...)

for authentication and only checks Hadoop • So unless you’ve already set these globally on the Spark session,

.save()

will fail to authenticate, even if you passed

credentials

into the Kedro dataset • For some formats (e.g., JDBC, S3 connectors), Spark allows passing authentication tokens directly as reader options. That’s why Kedro’s

SparkDataSet

supports merging

credentials

into

.load()

thankyou 1

Elena Khaustova

09/16/2025, 9:42 AM

Reading: works with credentials in the dataset, because Kedro passes them as options to

DataFrameReader

. Writing: those same credentials don’t get passed to

DataFrameWriter

. Spark tries to resolve the ABFS path and falls back to Hadoop configs.

Nok Lam Chan

09/16/2025, 1:18 PM

Hi @Júlio Resende, things are slightly different with Spark indeed. Part of the reason is that Spark has its own authentication mechanism which is different from the Kedro's one (

fsspec

based). You can still have multiple

spark.yml

configuration to keep different set of Spark credentials - though it's not as granular as a dataset level credentials.

thankyou 2

Júlio Resende

09/16/2025, 1:42 PM

Thank you! I created a custom dataset using the credentials in the

.options(...)

method

2 Views

Open in Slack

Previous Next