Jan
11/30/2022, 10:29 AMconf/base
will not be loaded? I would like to do something like kedro run --env=prod
and in the prod
env I have a catalog that is prefixed (e.g. file: data/prod/01_raw/file.txt
) so that I can have the prod data separated. I would like to avoid leakage of development data into the prod env. For example if I add a new step and create a new entry in the data catalogue (base
) and forget to add this entry in the prod catalog it will be used later on in the prod environment by default because it is not overwritten? Instead I would like to get an error or implicitly use a MemoryDataset, in other words: don't load conf/base
. Does this make sense? 😄
Edit: Just realizing that this behaviour would be possible if I just use conf/base
as the prod env and always develop in a conf/dev
env. However, ideally I would like to use by default the conf/base
and only work in prod by specifying it explicitly to avoid mistakenly changing something there 🤔datajoely
11/30/2022, 11:12 AMlocal
for the stuff you don’t want leakedJan
11/30/2022, 12:04 PMconf/base/catalog/generated_feature_1.pq
<-- overwritten by conf/prod/catalog/generated_feature_1.pq
If I now add something else during development:
conf/base/catalog/generated_feature_2.pq
This will not be overwritten by prod if it is not defined in the prod catalog. The pipeline run kedro --env=prod
will use now:
conf/prod/generated_feature_1.pq
and conf/base/catalog/generated_feature_2.pq
meaning the pipeline runs through without errors but i am using the development data at some point.datajoely
11/30/2022, 12:05 PMJan
11/30/2022, 12:09 PM