Hello Team I have been following this for resolving S3 crede Kedro #questions

Hello Team, I have been following this for resolvi...

NAYAN JAIN

10/29/2025, 1:56 PM

Hello Team, I have been following this for resolving S3 credentials at runtime: https://docs.kedro.org/en/1.0.0/extend/hooks/common_use_cases/#use-hooks-to-load-external-credentials However, I need to be able to connect to multiple S3 buckets (one for each dataset), and I need a few parameters at runtime to be able to assume AWS role and get credentials: account_id, role_arn, etc. To be able to do this with above approach, I would need my credential resolver hook to resolve based on the name of credential which could follow a special format (account_id/role_arn) and I cannot hardcode the names in the code. I need some lambda function values. Is this possible? Or would it be better to use config resolver instead as follows:

Copy code

weather:
 type: polars.EagerPolarsDataset
 filepath: <s3a://your_bucket/data/01_raw/weather*>
 file_format: csv
 credentials: ${s3_creds:123456789012,arn:role}

where s3_creds is a config resolver that returns a dictionary with access keys and secrets. One potential issue I see with this approach is that the credentials could expire if they are evaluated only at the beginning of pipeline and not every time a load or save is performed. Is there any better way to achieve what I want? • Dynamic credential resolution per dataset. • Credential refresh at load/save time.

Deepyaman Datta

10/29/2025, 6:27 PM

Hmm... I'm not aware of any built-in mechanism to refresh credentials before load/save; this might have to be done custom

before_dataset_loaded

before_dataset_saved

, if necessary. Credentials aren't even always handled in the same way; for most filesystem-based datasets, you'd basically need to reconstruct the

fsspec.filesystem

object? I can't find something on wanting to refresh credentials during pipeline runs, with a quick search; maybe somebody else has run into it.

Dmitry Sorokin

10/30/2025, 9:56 AM

it's a tricky use case, I think custom Datasets or datasets hooks can be useful here. @Elena Khaustova, should we consider some catalog updates to make this credential resolution more dynamic, what do you think?

Elena Khaustova

10/30/2025, 10:50 AM

It looks like it’s related to https://github.com/kedro-org/kedro/discussions/4320, which totally makes sense, but not as a part of the catalog functionality

👍 1

Deepyaman Datta

10/30/2025, 1:11 PM

@Dmitry Sorokin @Elena Khaustova I think the part of the problem that's still not addressed is, how do you refresh credentials? I suppose we're saying you need to essentially build "custom Datasets or datasets hooks" for this? I think that makes sense, since I can't imagine we'd change the baseline behavior to refresh credentials for every save/load? Unless was to make this configurable; it's just that haven't seen this request before.

NAYAN JAIN

10/30/2025, 3:13 PM

@Deepyaman Datta Could it be done it two different parts? 1. Ability to inject lambda like "credential_provider" into datasets. 2. (Specifically for EagerPolarsDataset) -> Initialize FileSystem object in load and save methods instead of constructor. This would mean that you could call the credential provider and ask it to provide new credentials during every call of load and save.

💡 1

Deepyaman Datta

10/30/2025, 7:14 PM

(Specifically for EagerPolarsDataset) -> Initialize FileSystem object in load and save methods instead of constructor. This would mean that you could call the credential provider and ask it to provide new credentials during every call of load and save.

I think this could work, but I feel like it should eventually be done for all datasets that are

fsspec

-based. My initial concern was that recreating the filesystem each time would be a bad idea. However, most

fsspec.AbstractFileSystem

instances are

cachable

. Instances are cached based on arguments to initialize filesystem and anything in

_extra_tokenize_attributes

. As such, it seems like it shouldn't be a problem to repeatedly create the filesystem.

Ability to inject lambda like "credential_provider" into datasets.

You can use a resolver already for credentials keys (

credentials

itself has to be a dict), but to inject a lambda as

credentials

I guess would need to support a resolver there. One possibility is to iterate through all the keys in

credentials

, and if something is a function object, to call it. This is, in a way, how Kedro supports lazy partitions. The filesystem creation pieces would then go into some helper function, and it would need to be called for every function that requires the filesystem object. If there was some sort of signal that you need to get a new filesystem object (maybe the fact that a

credentials

key is a function is a sufficient signal for that? or could be a more explicit dataset option), then you would need to do

type(self._fs).clear_instance_cache()

before constructing.

Deepyaman Datta

10/30/2025, 7:16 PM

So this looks feasible, but a fairly nontrivial change. Maybe it makes sense to try this in a custom version of

EagerPolarsDataset

(if that's the main one you're using right now), see if that works well? If so, could contribute it upstream + standardize the pattern across all the

fsspec

-based datasets? (This probably requires some bigger review, but if it works without issue and it isn't constructing a filesystem instance each time in the current case, I think this could be fine.)

Open in Slack

Previous Next