Hi all! Is there a way to run an environment exclu...
# questions
j
Hi all! Is there a way to run an environment exclusively, meaning that
conf/base
will not be loaded? I would like to do something like
kedro run --env=prod
and in the
prod
env I have a catalog that is prefixed (e.g.
file: data/prod/01_raw/file.txt
) so that I can have the prod data separated. I would like to avoid leakage of development data into the prod env. For example if I add a new step and create a new entry in the data catalogue (
base
) and forget to add this entry in the prod catalog it will be used later on in the prod environment by default because it is not overwritten? Instead I would like to get an error or implicitly use a MemoryDataset, in other words: don't load
conf/base
. Does this make sense? 😄 Edit: Just realizing that this behaviour would be possible if I just use
conf/base
as the prod env and always develop in a
conf/dev
env. However, ideally I would like to use by default the
conf/base
and only work in prod by specifying it explicitly to avoid mistakenly changing something there 🤔
👍 1
d
So you should use
local
for the stuff you don’t want leaked
that’s not committed to git for this reason
j
by leaked I mean data leakage of development data into production data, so it's not related to confidential data or git but to data in general (which is anyway not in git). I would like to be able to play around in the base environment without the possibility to "leak" data into the production environment. Like here:
conf/base/catalog/generated_feature_1.pq
<-- overwritten by
conf/prod/catalog/generated_feature_1.pq
If I now add something else during development:
conf/base/catalog/generated_feature_2.pq
This will not be overwritten by prod if it is not defined in the prod catalog. The pipeline
run kedro --env=prod
will use now:
conf/prod/generated_feature_1.pq
and
conf/base/catalog/generated_feature_2.pq
meaning the pipeline runs through without errors but i am using the development data at some point.
d
if you have one environment e.g. base / local not committed to git, then your PROD configuration would error if not configured right?
so you have safety net
j
updated a bit to clarify