Hello team! We'd like to use kedro for our data p...
# questions
n
Hello team! We'd like to use kedro for our data pipelines and we've got a design question for you. One of our use case is to load data from an external data provider through its API. The API is parameterised : <https:external-provider.com/api/contents/{content_id}> so we created a dataset factory based on the APIDataset in the catalog. The idea would now be to pass to kedro as parameter a list of content ids we're interested in. We thought we could create dynamically the dataset notes in the pipeline by iterating through this list of content ids. To do so, we access the params in code following https://docs.kedro.org/en/0.19.5/configuration/parameters.html#how-to-load-parameters-in-code It works well, but if we introduce
conf/prod/parameters.yml
, Kedro complains about the fact that the's a duplicate (unless we hardcode
env ='prod'
in
OmegaConfigLoader
I'm sure it's a dummy question, but we couldn't find a way to access the env at runtime. Would you mind pointing us to a way to do so? Also, what do you think about the design we chose ? Does it follow kedro best practices? Thx for any pointers!
m
Hi @Nicolas P, it should be possible to have the same parameters across environments. Can you post the exact error you're getting?
n
Here's the error
Copy code
Traceback (most recent call last):
 line 435, in _check_duplicates
    raise ValueError(f"{dup_str}")
ValueError: Duplicate keys found in ****conf/production/parameters.yml and ****/conf/base/parameters.yml: my_key
The error disappears if we harcode the env in
Copy code
conf_loader = OmegaConfigLoader(conf_source=conf_path,
                                    env="production",
                                    base_env="base")
m
Which version of Kedro are you using and can you tell me if you have something like this in
settings.py
?
Copy code
CONFIG_LOADER_ARGS = {
      "base_env": "base",
      "default_run_env": "local",
}
n
Yes indeed we got this in
settings.py
Copy code
CONFIG_LOADER_ARGS = {
    "base_env": "base",
    "default_run_env": "local",
}
Copy code
kedro~=0.19.6
m
hmm very odd.. I've tried to replicate the issue based on your info but it works fine for me. I have:
Copy code
conf/base/parameters.yml

my_key: bla
and
Copy code
conf/production/parameters.yml

my_key: bla
and this works fine. I only get that error if I have the same parameter in files within the same environment..
n
In our pipeline we call:
Copy code
def create_pipeline(**_kwargs) -> Pipeline:
    conf_path = str(settings.CONF_SOURCE)
    conf_loader = OmegaConfigLoader(conf_source=conf_path)
    content_ids = conf_loader["parameters"]["key]

    pipeline(
        [
            node(
                func=parse_content_data,
                inputs=f"content_data_{c}",
                outputs=f"content_data_{c}_csv",
                name=f"parse_content_data_{c}_node"
            )
            for c in content_ids
        ])
m
ahhh right now I'm getting the same
👍 1
yeah I guess because you're directly creating an instance of the
OmegaConfigLoader
you have to provide the environment there. Usually when using the cli, Kedro will determine the environment properly for you.
n
Ok but we're not sure how to access the environment at this point.
m
hmmm I have to think about this.. this is definitely not how we designed the configloader to be used. In general, it's not recommended to do this dynamic stuff, but there's a blog post that proposes a decent approach: https://getindata.com/blog/kedro-dynamic-pipelines/ I'm not entirely sure if that works with your case though.
n
Ok thx a lot, we'll take a look ! It also seemed to us that this solution was a bit hacky. Another solution I though of would be to create a custom dataset taking a list of content ids as data, it would create internally and dynamically the subdatasets, load them and merge them. Generally speaking we've got a lot of APIDataset that we need to deal with and ideally we would like to be able to have generic pipelines that takes parameters allowing to define the specific API parameters.
n
Thx a lot for the link, I'm having a look to it!