Hello! I'd appreciate a bit of help with an issue...
# questions
a
Hello! I'd appreciate a bit of help with an issue that I'm facing. I have a kedro pipeline that is writing artifacts to a s3 bucket. In my project, I've got various envs; staging, dev, various experiments, etc. I'd like to separate the artifacts based on the env, but I only have one s3 bucket, so I was thinking to just do it with directories in my bucket, like so:
Copy code
"example_data_{x}_{y}":
  type: pandas.ParquetDataset
  filepath: "<s3://path/to/{ENV}/rest/of/path/example_data_{x}_{y}.pqt>"
  credentials: my_s3_creds
Currently
ENV
is an environment variable. I have figured out how to make the artifact's name dynamic, but how can I make the artifact's s3 filepath dynamic as well? Many thanks in advance! I'm using kedro
0.19.4
n
Can it be done in a static way, like literally have this configuration in all the environments you mentions with a different default? That’s one of the point of having a catalog file that describes all the data in a specific environment
a
Maybe? Maybe I never quite understood why the catalog was designed to be static, and if I understood that then I'd have my solution? As it is, I have a single code repository, and my deployment system reads my code repository and adds its own env vars, which affect what data sources (mostly SQL) that I read from and write to when the code executes. Should I have more than one catalog.yml maybe? In that case, how do I tell kedro to read one and not the other (based on an env var)?
If I try to just do this:
filepath: "<s3://path/to/${ENV}/rest/of/path/example_data_{x}_{y}.pqt>"
Then I get
omegaconf.errors.InterpolationKeyError: Interpolation key 'ENV' not found
or maybe a factory to generate catalog.yml on the fly? đŸ¤”
n
Kedro run --env? I think this is exactly why it should be static, dynamic config are hard to understand and deployment team are not the one who build the pipeline. They should have control of things that they need to control but not for configuration like hyperparameter of your model. https://docs.kedro.org/en/0.19.6/configuration/configuration_basics.html#configuration-environments Have you perhaps seen this already?
If you wish to use environment variable as config, you need to use the oc.env resolver. https://docs.kedro.org/en/0.19.6/configuration/advanced_configuration.html We suggest not to do this when possible, by default env resolver only works for credentials.
This is more adhere to the twelve factor apps principle, configuration is external to the application. It's very simple to have a shared base folder for configuration, and let your deployment team to decide how to change that.