NAYAN JAIN
10/29/2025, 1:56 PMweather:
type: polars.EagerPolarsDataset
filepath: <s3a://your_bucket/data/01_raw/weather*>
file_format: csv
credentials: ${s3_creds:123456789012,arn:role}
where s3_creds is a config resolver that returns a dictionary with access keys and secrets. One potential issue I see with this approach is that the credentials could expire if they are evaluated only at the beginning of pipeline and not every time a load or save is performed.
Is there any better way to achieve what I want?
• Dynamic credential resolution per dataset.
• Credential refresh at load/save time.Deepyaman Datta
10/29/2025, 6:27 PMbefore_dataset_loaded, before_dataset_saved, if necessary. Credentials aren't even always handled in the same way; for most filesystem-based datasets, you'd basically need to reconstruct the fsspec.filesystem object?
I can't find something on wanting to refresh credentials during pipeline runs, with a quick search; maybe somebody else has run into it.Dmitry Sorokin
10/30/2025, 9:56 AMElena Khaustova
10/30/2025, 10:50 AMDeepyaman Datta
10/30/2025, 1:11 PMNAYAN JAIN
10/30/2025, 3:13 PMDeepyaman Datta
10/30/2025, 7:14 PM(Specifically for EagerPolarsDataset) -> Initialize FileSystem object in load and save methods instead of constructor. This would mean that you could call the credential provider and ask it to provide new credentials during every call of load and save.I think this could work, but I feel like it should eventually be done for all datasets that are
fsspec-based.
My initial concern was that recreating the filesystem each time would be a bad idea. However, most fsspec.AbstractFileSystem instances are cachable. Instances are cached based on arguments to initialize filesystem and anything in _extra_tokenize_attributes. As such, it seems like it shouldn't be a problem to repeatedly create the filesystem.
Ability to inject lambda like "credential_provider" into datasets.You can use a resolver already for credentials keys (
credentials itself has to be a dict), but to inject a lambda as credentials I guess would need to support a resolver there. One possibility is to iterate through all the keys in credentials, and if something is a function object, to call it. This is, in a way, how Kedro supports lazy partitions.
The filesystem creation pieces would then go into some helper function, and it would need to be called for every function that requires the filesystem object. If there was some sort of signal that you need to get a new filesystem object (maybe the fact that a credentials key is a function is a sufficient signal for that? or could be a more explicit dataset option), then you would need to do type(self._fs).clear_instance_cache() before constructing.Deepyaman Datta
10/30/2025, 7:16 PMEagerPolarsDataset (if that's the main one you're using right now), see if that works well? If so, could contribute it upstream + standardize the pattern across all the fsspec-based datasets?
(This probably requires some bigger review, but if it works without issue and it isn't constructing a filesystem instance each time in the current case, I think this could be fine.)