Toni - TomTom - Madrid
03/25/2024, 12:00 PMToni - TomTom - Madrid
03/25/2024, 12:13 PMToni - TomTom - Madrid
03/25/2024, 12:14 PMsearch_sessions_logs:
type: spark.SparkDataset
filepath: <abfss://bronze@adlsmapsanalyticspoi.dfs.core.windows.net/external_sources/search_logs_amigo>
file_format: delta
Toni - TomTom - Madrid
03/25/2024, 12:16 PMToni - TomTom - Madrid
03/25/2024, 12:27 PMToni - TomTom - Madrid
03/25/2024, 12:27 PMTypeError: DatabricksFileSystem.__init__() missing 2 required positional arguments: 'instance' and 'token'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/bin/kedro", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/site-packages/kedro/framework/cli/cli.py", line 198, in main
cli_collection()
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/kedro/framework/cli/cli.py", line 127, in main
super().main(
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/kedro/framework/cli/project.py", line 225, in run
session.run(
File "/usr/local/lib/python3.10/site-packages/kedro/framework/session/session.py", line 374, in run
catalog = context._get_catalog(
File "/usr/local/lib/python3.10/site-packages/kedro/framework/context/context.py", line 231, in _get_catalog
catalog: DataCatalog = settings.DATA_CATALOG_CLASS.from_config(
File "/usr/local/lib/python3.10/site-packages/kedro/io/data_catalog.py", line 294, in from_config
datasets[ds_name] = AbstractDataset.from_config(
File "/usr/local/lib/python3.10/site-packages/kedro/io/core.py", line 164, in from_config
raise DatasetError(
kedro.io.core.DatasetError:
DatabricksFileSystem.__init__() missing 2 required positional arguments: 'instance' and 'token'.
Dataset 'search_sessions_logs' must only contain arguments valid for the constructor of 'kedro_datasets.spark.spark_dataset.SparkDataset'.
Juan Luis
03/25/2024, 1:33 PMDatabricksFileSystem
come from?Juan Luis
03/25/2024, 1:34 PMToni - TomTom - Madrid
03/26/2024, 7:51 AMNok Lam Chan
03/26/2024, 4:24 PMdbfs://
with /dbfs/
? Please also share what you put in filepath
since the example you share above doesn't seem to match the error message hereNok Lam Chan
03/26/2024, 4:25 PMDatabricksFileSystem
to start with, IIRC the dbfs fielsystem provided by fsspec is not useful.
In addition, spark has native way to access remote storage, so it shouldn't even use fsspec.Nok Lam Chan
03/26/2024, 4:27 PMdbfs:/
should be equivalent to /dbfs
on databricksToni - TomTom - Madrid
03/27/2024, 7:30 AMNok Lam Chan
03/27/2024, 11:48 AMsearch_sessions_logs:
type: spark.SparkDataset
filepath: dbfs:/bronze@adlsmapsanalyticspoi.dfs.core.windows.net/external_sources/search_logs_amigo
file_format: delta
And I only get Java error. I don't think you need to put any credentials
since Spark authenticate with its own way. If it's triggering fsspec then it's very likely it's a bug but I would appreciate if you can share an example that we can reproduce.Toni - TomTom - Madrid
04/01/2024, 7:26 AMGuillermo Caminero
04/01/2024, 1:26 PMGuillermo Caminero
04/01/2024, 1:27 PMNok Lam Chan
04/01/2024, 10:45 PMfsspec
, specifically for azure related it will be https://github.com/fsspec/adlfs. I didn't find anything mention wasbs so chance are it's not supported.Guillermo Caminero
04/02/2024, 7:36 AMaccount_name
and the sas_token
but you can also use the account_name
and the account_key
or a service principal.
If you set the credentials with these names kedro is able to pass them to fsspec as you have a *credentials
to send them to it.
You can find examples on how to do it manually here (but with kedro is only configuration): https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/tutorial-spark-pool-filesystem-spec