Raghunath Nair
11/22/2023, 12:17 PMJuan Luis
11/22/2023, 12:24 PMNok Lam Chan
11/22/2023, 1:39 PMNok Lam Chan
11/22/2023, 1:48 PMfsspec
to handle different storage system, but there are issues with fsspec
so we add support for abfss
manually.Raghunath Nair
11/22/2023, 4:12 PMRaghunath Nair
11/22/2023, 4:13 PMJuan Luis
11/22/2023, 4:14 PMRaghunath Nair
11/22/2023, 4:16 PMRaghunath Nair
11/22/2023, 4:17 PMRaghunath Nair
11/22/2023, 4:18 PMJuan Luis
11/22/2023, 4:19 PMJuan Luis
11/22/2023, 4:19 PMRaghunath Nair
11/22/2023, 4:19 PMRaghunath Nair
11/22/2023, 4:22 PMRaghunath Nair
11/22/2023, 4:22 PMRaghunath Nair
11/22/2023, 4:23 PMabfss
couldn’t foundJuan Luis
11/22/2023, 4:25 PMpip install adlfs
and try again?Raghunath Nair
11/22/2023, 4:25 PMJuan Luis
11/22/2023, 4:25 PMRaghunath Nair
11/22/2023, 4:25 PMRaghunath Nair
11/22/2023, 4:26 PMRaghunath Nair
11/22/2023, 4:26 PMRaghunath Nair
11/22/2023, 4:27 PMRaghunath Nair
11/22/2023, 4:33 PMNok Lam Chan
11/22/2023, 4:48 PMRaghunath Nair
11/22/2023, 4:49 PMNok Lam Chan
11/22/2023, 4:50 PMNok Lam Chan
11/22/2023, 4:51 PMRaghunath Nair
11/22/2023, 4:52 PMNok Lam Chan
11/22/2023, 4:58 PMadfls
at least for fsspec
https://github.com/fsspec/filesystem_spec/blob/f7b454e544de7f2e5bc8ab737219e34e6282bdb5/fsspec/registry.py#L136-L138Raghunath Nair
11/22/2023, 5:01 PMNok Lam Chan
11/22/2023, 5:01 PMRaghunath Nair
11/22/2023, 5:01 PMRaghunath Nair
11/22/2023, 5:05 PMRaghunath Nair
11/22/2023, 5:07 PMRaghunath Nair
11/22/2023, 5:09 PMRaghunath Nair
11/22/2023, 5:09 PMNok Lam Chan
11/22/2023, 5:10 PMRaghunath Nair
11/22/2023, 5:10 PMNok Lam Chan
11/22/2023, 5:10 PMRaghunath Nair
11/22/2023, 5:11 PMNok Lam Chan
11/22/2023, 5:11 PMRaghunath Nair
11/22/2023, 5:11 PMRaghunath Nair
11/22/2023, 5:12 PMNok Lam Chan
11/22/2023, 5:12 PMNok Lam Chan
11/22/2023, 5:13 PMdataset
problem or is there something going wrong in kedro’s coreNok Lam Chan
11/22/2023, 5:13 PMRaghunath Nair
11/22/2023, 5:14 PMRaghunath Nair
11/22/2023, 5:14 PMRaghunath Nair
11/22/2023, 5:14 PMNok Lam Chan
11/22/2023, 5:14 PMRaghunath Nair
11/22/2023, 5:14 PMNok Lam Chan
11/22/2023, 5:14 PMRaghunath Nair
11/22/2023, 5:15 PMRaghunath Nair
11/22/2023, 5:15 PMRaghunath Nair
11/22/2023, 5:15 PMRaghunath Nair
11/22/2023, 5:15 PMRaghunath Nair
11/22/2023, 8:57 PMRaghunath Nair
11/22/2023, 8:59 PMNok Lam Chan
11/23/2023, 5:23 AMfsspec
at all.
In kedro-datasets 1.8.0, we do use fsspec https://github.com/kedro-org/kedro-plugins/blob/16c6d5e144ad1f67afba9984ca606e13e51217e4/kedro-datasets/kedro_datasets/spark/spark_dataset.py#L355
I will be very surprised if you get the same error using different implementationsRaghunath Nair
11/23/2023, 7:53 AMNok Lam Chan
11/23/2023, 7:54 AM@Raghunath Nair Sorry can we keep the conversation inside the thread?
Raghunath Nair
11/23/2023, 7:54 AMRaghunath Nair
11/23/2023, 7:55 AMRaghunath Nair
11/23/2023, 7:55 AMNok Lam Chan
11/23/2023, 7:56 AMRaghunath Nair
11/23/2023, 7:56 AMRaghunath Nair
11/23/2023, 7:56 AMRaghunath Nair
11/23/2023, 7:56 AMRaghunath Nair
11/23/2023, 7:57 AMNok Lam Chan
11/23/2023, 8:05 AMfsspec
You can find this in adfls
, so regardless abfs
or abfss
, fsspec
using the same AzureBlobFileSystem class.
entry_points={
"fsspec.specs": [
"abfss=adlfs.AzureBlobFileSystem",
],
},
Since you mentioned you never have to use adfls
before (I assume it is not installed right?) I suspect the connection was done via Spark directly in 0.17.7, thus I ask you to test with the old Spark.SparkDataset.
Both Spark / fsspec could work, but we don’t need to solve both at the same time.Raghunath Nair
11/23/2023, 8:14 AMRaghunath Nair
11/23/2023, 8:16 AMNok Lam Chan
11/23/2023, 8:16 AMNok Lam Chan
11/23/2023, 8:18 AMNok Lam Chan
11/23/2023, 8:19 AMfsspec.filesystem()
a path starting with abfss://
and see if it gives you an AzureBlobFileSystem classNok Lam Chan
11/23/2023, 8:20 AMRaghunath Nair
11/23/2023, 8:20 AMRaghunath Nair
11/23/2023, 8:21 AMRaghunath Nair
11/23/2023, 8:21 AMNok Lam Chan
11/23/2023, 8:24 AMfsspec.filesystem("abfss")
Nok Lam Chan
11/23/2023, 8:25 AMFile ~/GitHub/adlfs/adlfs/spec.py:318, in AzureBlobFileSystem.__init__(self, account_name, account_key, connection_string, credential, sas_token, request_session, socket_timeout, blocksize, client_id, client_secret, tenant_id, anon, location_mode, loop, asynchronous, default_fill_cache, default_cache_type, version_aware, assume_container_exists, max_concurrency, timeout, connection_timeout, read_timeout, **kwargs)
307 if (I get error from
adfls
so I do think it is registered properlyRaghunath Nair
11/23/2023, 8:32 AMNok Lam Chan
11/23/2023, 8:34 AMRaghunath Nair
11/23/2023, 8:38 AMNok Lam Chan
11/23/2023, 8:42 AMadlfs
installed. It’s very strange we are getting different results
adlfs 2023.8.0
fsspec 2023.10.0
Nok Lam Chan
11/23/2023, 8:44 AMRaghunath Nair
11/23/2023, 8:59 AMRaghunath Nair
11/23/2023, 9:00 AMRaghunath Nair
11/23/2023, 9:00 AMNok Lam Chan
11/23/2023, 9:05 AMRaghunath Nair
11/23/2023, 9:07 AM