Can anyone suggest a good way of dynamically chang...
# questions
b
Can anyone suggest a good way of dynamically changing a catalog entries path? For example, by default I want to use local paths for my intermediate datasets, but when I deploy to production I don't want anything to be saved locally. Duplicating the catalog.yml in the conf/production/ folder is not ideal, as I will have to maintain two sets of each catalog entry.
h
Someone will reply to you shortly. In the meantime, this might help:
d
We’re in the middle of building a new Kedro catalog where some of these requirements are going to bee covered @Elena Khaustova where is the best place to read up on this milestone?
But actually I think dataset factories may make your catalogs much simpler
Are you using this?
e
This particular feature won’t be in the new catalog, as we still suggest replacing the datasets instead of modifying them. What can be helpful here is creating two catalogs for different environments and using needed based on the current environment.
👍 1
m
What we do is to just change the path structure dynamically using env vars or globals. So a typical path would look like:
${globals:file_system}://${globals:prefix}/…
where file system is
s3
in prod and
file
for local testing
b
I ended up creating a quick and dirty solution similar to with @Matthias Roels by changing catalog file paths in settings.py the only part I haven't figured out is how to know what the environment parameter is from settings.py? e.g.
kedro run
should be whatever default env is (in my project it would be local) but
kedro run --env production
should be production this may be off topic and worth a new thread, but its important as I want to still leverage the local filesystem when developing locally.
d
Ah settings.py isn’t related to that, what do you need to do with that information?
Hooks are registered in settings.py and can intercept the environment argument if needed
b
I can't share any of the code publicly so ill try my best to paraphrase.... but basically I want to do
Copy code
# settings.py

if env=='local':
    pass
elif env in ('production', 'staging'):
    change_catalog_filepaths()
the
change_catalog_filepaths()
is working and doing exactly what I want, I just don't know of a non hacky way to access env in settings.py
d
Okay gimme a sec to think about this
❤️ 1
Are you using dataset factories
b
yes we are using some dataset factories, but not all catalog entries are
d
Okay and how big is your catalog?
You can intercept a runtime parameter, set it as a global
And then use that argument in your file paths
b
len(OmegaConfigLoader(conf_source='conf/', **CONFIG_LOADER_ARGS).get('catalog').items())
is 53 in settings.py but thats before the dataset factories get resolved into their own unique entries
d
Okay so I think tempting your file paths to be driven by the environment argument is the way to go
So you’ll always have to run a cli command with the argument
But the minute you’re doing that sort of stuff in settings.py you’re kind of going out of bounds
I’d read through the runtime parameters/ global parameters / configuration environment docs
All should be there
b
hmmm ok... A few months ago I spent a good amount of time reading through kedro docs + source code to see if I can get the kedro environment without any hacks... but for that specific use case was for a very different problem I was able to solve in a more "kedro approved" way (in that case it was a custom dataset + credentials set per ENV for an email alerting system)
d
Okay so there are more elegant solutions, but what you can do is drive everything by the
KEDRO_ENV
environment variable? • use an omegaconf resolver to inject the variable in the filepath • kedro will select the right environment based on this • You can also do before_command_run hook to set the env var if you drive it by the CLI
👍 1
m
Why do you want to actually change the filepaths? You could just parametrise them using globals. This way, you keep the same structure, you don’t need any additional code and you can just put the different options in a globals file in your production/local kedro environment folders.
d
What Matthias is saying is correct
b
Yeah after sleeping on this I am going to use the global variable solution, ty for the help
@Matthias Roels im revisiting this as I realized when I run this locally its saving to C:/data/01_raw/ instead of ./data/01_raw/ is this expected or is there a way to make the globals.yaml dynamically resolve the absolute path of my kedro project?
m
If you do:
Copy code
filepath: ${globals:filesystem}/${globals:bucket}/…
With filesystem either
s3://
or
./data
and bucket the name of your S3 bucket or a subfolder of data to your choosing