question to the Kedro hive mind: I want to define ...
# questions
j
question to the Kedro hive mind: I want to define a Delta dataset backed by a local MinIO instance, for which I need to specify some credentials as follows:
Copy code
statuses_table:
  type: polars.EagerPolarsDataset
  file_format: delta
  filepath: s3://...
  save_args:
    storage_options:
      AWS_ENDPOINT_URL: "<http://127.0.0.1:9010>"
      AWS_ACCESS_KEY_ID: "minioadmin"
      AWS_SECRET_ACCESS_KEY": "minioadmin"
      AWS_REGION: "<localhost>"
      AWS_ALLOW_HTTP: "true"
      AWS_S3_ALLOW_UNSAFE_RENAME: "true"
naturally I don't want to commit the access key and secret key to
base/catalog.yml
, but on the other hand this doesn't really suit how
credentials.yml
works. the only way I can think of is using
local/globals.yml
, which works fine but somehow feels imperfect, and also makes me think why do we need the
credentials.yml
if using globals is more flexible. thoughts?
d
can you do this with env vars?
j
it's discouraged by Kedro too,
${oc.env}
is only available for credentials
facepalming 2
t
Shouldn’t the
storage_options
be in
fs_args
?
fs_args
and
credentials
are merged and passed to the fsspec filesystem at it’s initialization
👀 2
👍🏼 1
m
Just use the
oc.env
by specifying it as custom resolver.
👍 1
d
should we just relax this limitation?
j
you're right @Takieddine Kadiri :
Copy code
self._fs = fsspec.filesystem(self._protocol, **_credentials, **_fs_args)
but in this particular case,
pl.DataFrame.write_delta
expects the
storage_options
already. I think there might be an expectations mismatch between how we think Polars is using fsspec and how Polars is actually not using it. related to the issue I opened a moment ago: https://github.com/kedro-org/kedro-plugins/issues/444
👀 1
n
Different problems are discussed: • The main one is how Polars is using fsspec • the
credentials
get merged with
fsspec
may not be obvious enough, anything we can improve here? •
credentials
vs
globals
vs env vars. Credentials has always been the more weird kind of configuration, this was created before
$os.env
is available.
credentials.yml
is almost always used locally, since it shouldn’t be added into version control, and for most production environment you don’t even have the access of this variable, it’s control by sysadmin. In a way,
credentials
is a special kind of OmegaConf variable interpolation built in Kedro and become more redundant in 0.18. ◦ Alternative you can also provide credentials with Hooks - https://docs.kedro.org/en/stable/hooks/common_use_cases.html#use-hooks-to-load-external-credentials
👍 2