Hi everyone! First, thanks for reading my question...
# questions
a
Hi everyone! First, thanks for reading my question. Now, onto it. I'm developing a
kedro-pandera
plugin to add data validation (might open-source it when it works fine for some time on my comapny's end). In one of the commands I get the Kedro context through the session:
Copy code
with KedroSession.create(metadata.package_name, project_path, env=env) as session:
    context = session.load_context()
I'm passing the
env
here as I iteratively go through the environmnets to create schema files for the datasets selected by passing args to the command. Fastforward, I try to load a dataset from the Data Catalog:
Copy code
dataset = context._get_catalog(save_version="")._get_dataset(dataset_name)
And here I experience an error:
Copy code
$ kedro pandera init --env base

(...)

/usr/local/Caskroom/miniconda/base/envs/liquidity-prediction-env/lib/python3.10/site-packages/kedro/io/data_catalog.py:50 in _get_credentials

KeyError: "Unable to find credentials '<redacted>': check your data catalog and credentials configuration. See <https://kedro.readthedocs.io/en/stable/kedro.io.DataCatalog.html> for an example."
It seems that when I created the session, context, and then the catalog, none of them loaded
conf/local/credentials.yml
. Why is that? Is it on purpose (to prevent plugins from stealing credentials) or am I doing something wrong? Why does it work when session, context and catalog are created in the project itself while running
kedro run
? I'm using
kedro==0.18.4
.
j
hi @Aspen Olszewska! sorry for the digression and not directly answering your question - I don't want to discourage you from creating your plugin, but have you seen https://github.com/galileo-Galilei/kedro-pandera ?
👍 1
a
yes, I have. I've created mine around two years ago, but now I got back to it as I need data validation again. I've developed kedro-mlflow with galileo-galilei in the early days.
👋 1
still, I'd like to know the reason for the discrepancy between session/context/catalog creation when running kedro run vs a plugin command
I didn't use any credentials back then, when I created the plugin, and now I do
@Juan Luis I have looked at the source code of the galileo's plugin and mine is more advanced/more developed tbh
👍🏼 1
👍 1
j
exciting 🔥
n
Hey, not sure if I have it right. I think the problem is Kedro always run with base + env (default: local)
What you want to have seems to be local + custom env if I understand correctly, if you try to pass in
env="local"
, see if the credentials are loaded properly?
a
let me see
n
and re:
kedro-pandera
, we started developing it last year but I don't think anyone is using it actively. happy to contribute to something existed already if it open source. Does it supports the feature described in https://github.com/Galileo-Galilei/kedro-pandera/issues?
a
ok, so that does do the trick of loading the credentials AND the base's Data Catalog
but now it saves the schemas in
conf/local/
...
n
Where do you expect it to save? And how does the code look like? I guess this is on your implementation
a
I would need to load env passed through cmd and overlay local on top
yes, it is my implementation
it's not open source yet
I need to run it through my org first to see if I have a green light, but that shouldn't be a problem
👍🏼 1
n
This might be useful for you, but long story short, Kedro default is
base
+
local
(which you can override via cli), there is a long discussion thread about this that I cannot find it now.
Copy code
CONFIG_LOADER_ARGS = {
      "base_env": "base",
      "default_run_env": "local",
#       "config_patterns": {
#           "spark" : ["spark*/"],
#           "parameters": ["parameters*", "parameters*/**", "**/parameters*"],
#       }
}
If you start a new kedro project, you will find this in
settings.py
a
as for the features, I cannot tell. I didn't read any of the issues in Yolan's repo
n
so you can change what "base" is up to your decision
though I rarely see people do it, but it's possible
a
huh, interesting
thanks for the pointer
and for the help with the config loading part!
n
for you case you just need to flip the config
a
yup
I wonder how it works out of the box for regular
kedro run
though...
n
It would work too, but now you have to keep in mind what you call "local" is the "baes" everywhere Kedro calls
local will have a lower priority than the custom env
which may be undesired
a
yeah I mean credentials loading - it works right out of the box without fliping the base env name
by default it runs base but local is loaded as well
I wonder how that happens
might be best to replicate that
n
so out of the box Kedro run base + local, if you don't specify anything local will be loaded
what doens't work after you flip it?
maybe I got a bit lost
a
it does work after flipping
but I mean pure kedro
pure kedro run
without any plugin
when you run kedro run --env base it still loads credentials from local
👀 1
with default settings
anyways, new thing to figure out! I'll go through the list of issues in Yolan's repo tomorrow and run opensourcing my plugin through someone in my company. thank you for you help @Nok Lam Chan!
👍🏼 1
n
I see what you mean, I think I missed something but i gotta go now. I'll come back to this tmr but let's see if someone beat me to it
I checked
kedro run --env base
won't read
local
, can you double check?