Iñigo Hidalgo
10/19/2023, 3:29 PMpl.read_parquet(
f"az://{os.environ['CONTAINER_NAME_ENV_KEY']}/data/02_intermediate/blablabla.parquet",
storage_options = {
"account_name": os.environ["AZURE_STORAGE_ACCOUNT_DATA_NAME"],
"anon": False
}
)
the following:
#credentials.yml
azure_blob:
account_name: ${oc.env:AZURE_STORAGE_ACCOUNT_DATA_NAME}
anon: false
#catalog.yml
input_data:
type: polars.GenericDataset
file_format: parquet
filepath: az://${oc.env:CONTAINER_NAME_ENV_KEY}/data/blabla.parquet
credentials: azure_blob
results in the following error when I try to load
--> 153 with self._fs.open(load_path, **self._fs_open_args_load) as fs_file:
154 return load_method(fs_file, **self._load_args)
File ~/.venv/lib/python3.10/site-packages/fsspec/spec.py:1241, in AbstractFileSystem.open(self, path, mode, block_size, cache_options, compression, **kwargs)
1240 ac = kwargs.pop("autocommit", not self._intrans)
-> 1241 f = self._open(
1242 path,
1243 mode=mode,
1244 block_size=block_size,
1245 autocommit=ac,
1246 cache_options=cache_options,
...
201 )
--> 202 raise DatasetError(message) from exc
DatasetError: Failed while loading data from data set GenericDataset(file_format=parquet, filepath=/data/blablabla.parquet, load_args={}, protocol=az, save_args={}).
[Errno 2] No such file or directory: 'data/blablabla.parquet'
(Please ignore any inconsistencies container names, filenames etc, I tried to remove some information when pasting into slack but I probably wasn't super thorough)
The container name is being stripped from the filepath which I assume is being supplied to fsspec somewhere else, but I'm not entirely sure why the load is failing when the pure polars call is working.
I know polars recently did away with fsspec and implemented their own native support for cloud (https://github.com/pola-rs/polars/pull/11210) but I'm not sure if it has anything to do with that.az://${oc.env:CONTAINER_NAME_ENV_KEY}/data/blabla.parquet
is being correctly interpolated to give the proper filepathaz
to abfs
fixes it, but az
works with polars, and should work with fsspec too through adlfs
.