https://kedro.org/ logo
#questions
Title
# questions
v

Vassilis Kalofolias

02/16/2023, 11:06 AM
Hello, I am trying to override a bool parameter using the CLI (running from bash):
kedro run --params round_occupancy:False
However the
False
is read as a string. Is there a way to pass a boolean instead? Note that the original param is correctly read from the Yaml file as a bool.
d

datajoely

02/16/2023, 11:06 AM
false
I think
Copy code
In [3]: import yaml

In [4]: yaml.safe_load('test: false')
Out[4]: {'test': False}
v

Vassilis Kalofolias

02/16/2023, 11:22 AM
The Yaml is read correctly. The problem is when I pass it to the CLI (from bash).
kedro run --params round_occupancy:False
-> python reads str
'False'
kedro run --params "round_occupancy: false"
-> python reads str
'false'
d

datajoely

02/16/2023, 11:22 AM
ah that’s annoying
okay it’s then getting coerced by click badly
so there is a way to get round this sort of
kedro run --config config.yaml
which you can do in an ugly one liner this way
echo 'params:\n\tround_occupancy: False' > config.yaml && kedro run --config config.yaml
v

Vassilis Kalofolias

02/16/2023, 2:01 PM
Hmm that's weird, there seems to be a problem with argument
--config
(or I am doing something wrong):
Copy code
$ poetry run kedro run --config config.yaml 

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/<username>/<--------         path_to_kedro_project    ------>/.venv/bin/kedro:8 in         │
│ <module>                                                                                         │
│                                                                                                  │
│ /home/<username>/<--------         path_to_kedro_project    ------>/.venv/lib/python3.9/site-pac │
│ kages/kedro/framework/cli/cli.py:211 in main                                                     │
│                                                                                                  │
│   208 │   """                                                                                    │
│   209 │   _init_plugins()                                                                        │
│   210 │   cli_collection = KedroCLI(project_path=Path.cwd())                                     │
│ ❱ 211 │   cli_collection()                                                                       │
│   212                                                                                            │
│                                                                                                  │
│ /home/<username>/<--------         path_to_kedro_project    ------>/.venv/lib/python3.9/site-pac │
│ kages/click/core.py:1130 in __call__                                                             │
│                                                                                                  │
│ /home/<username>/<--------         path_to_kedro_project    ------>/.venv/lib/python3.9/site-pac │
│ kages/kedro/framework/cli/cli.py:139 in main                                                     │
│                                                                                                  │
│   136 │   │   )                                                                                  │
│   137 │   │                                                                                      │
│   138 │   │   try:                                                                               │
│ ❱ 139 │   │   │   super().main(                                                                  │
│   140 │   │   │   │   args=args,                                                                 │
│   141 │   │   │   │   prog_name=prog_name,                                                       │
│   142 │   │   │   │   complete_var=complete_var,                                                 │
│                                                                                                  │
│ /home/<username>/<--------         path_to_kedro_project    ------>/.venv/lib/python3.9/site-pac │
│ kages/click/core.py:1055 in main                                                                 │
│                                                                                                  │
│ /home/<username>/<--------         path_to_kedro_project    ------>/.venv/lib/python3.9/site-pac │
│ kages/click/core.py:1655 in invoke                                                               │
│                                                                                                  │
│ /home/<username>/<--------         path_to_kedro_project    ------>/.venv/lib/python3.9/site-pac │
│ kages/click/core.py:920 in make_context                                                          │
│                                                                                                  │
│ /home/<username>/<--------         path_to_kedro_project    ------>/.venv/lib/python3.9/site-pac │
│ kages/click/core.py:1378 in parse_args                                                           │
│                                                                                                  │
│ /home/<username>/<--------         path_to_kedro_project    ------>/.venv/lib/python3.9/site-pac │
│ kages/click/core.py:2360 in handle_parse_result                                                  │
│                                                                                                  │
│ /home/<username>/<--------         path_to_kedro_project    ------>/.venv/lib/python3.9/site-pac │
│ kages/click/core.py:2322 in process_value                                                        │
│                                                                                                  │
│ /home/<username>/<--------         path_to_kedro_project    ------>/.venv/lib/python3.9/site-pac │
│ kages/kedro/framework/cli/utils.py:377 in _config_file_callback                                  │
│                                                                                                  │
│   374 │   section = ctx.info_name                                                                │
│   375 │                                                                                          │
│   376 │   if value:                                                                              │
│ ❱ 377 │   │   config = anyconfig.load(value)[section]                                            │
│   378 │   │   ctx.default_map.update(config)                                                     │
│   379 │                                                                                          │
│   380 │   return value                                                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
KeyError: 'run'
d

datajoely

02/16/2023, 2:02 PM
hmm
can you show me the contents of your
config.yml
?
v

Vassilis Kalofolias

02/16/2023, 2:05 PM
I tried with: v1:
Copy code
params:\n\tround_occupancy: False
v2:
Copy code
parameters:
  round_occupancy: False
v3:
Copy code
params:
  round_occupancy: False
d

datajoely

02/16/2023, 2:10 PM
and all failed?
v

Vassilis Kalofolias

02/16/2023, 2:10 PM
Yes..
it looks like the top level key needs to be
run
sorry I wasn’t aware
v

Vassilis Kalofolias

02/16/2023, 2:11 PM
aahh let me try
💪🏼 it worked! Thanks a lot!!
d

datajoely

02/16/2023, 2:16 PM
😅 fantastic
v

Vassilis Kalofolias

02/16/2023, 2:16 PM
OK now next level: What is the easiest way to override parameters programmatically? (without CLI)? I am trying to run a pipeline in jupyter, with different parameters for each run.
(sorry for insisting on these "dynamic" use cases 😅)
d

datajoely

02/16/2023, 2:16 PM
hooks
so the
before_pipeline_run
hook has access to the
catalog
and you can just mutate them there
but jupyter may make that a bit difficult
jupyterlab may be easier as you could just open a seperate window, edit the file and then %reload_kedro in another
v

Vassilis Kalofolias

02/16/2023, 2:20 PM
I work in VS code so editing a file is not difficult. But I was hoping to make it all in code (without editing a file) in a self-contained notebook.
d

datajoely

02/16/2023, 2:21 PM
I guess you could use the python API, but you do lose some features, this is a typical scaffold for writing a Kedro pipeline in Pytest
Copy code
from <http://kedro.io|kedro.io> import DataCatalog
from kedro.runner import SequentialRunner

from my_pipe import create_pipeline


def test_pipeline():
    pipe = create_pipeline()
    runner = SequentialRunner()
    catalog = DataCatalog()
    runner.run(pipe, catalog)
you don’t need any YAML for that
v

Vassilis Kalofolias

02/16/2023, 2:22 PM
So theoretically I should change directly the dictionary context.params?
d

datajoely

02/16/2023, 2:22 PM
you don’t get some things like hooks
I think you can do
catalog.add()
but I’m not 100% sure
v

Vassilis Kalofolias

02/16/2023, 2:24 PM
yeah this is what I do to edit datasets:
Copy code
catalog_edited = catalog.shallow_copy()
catalog_edited.add_all(..., replace=True)
But I have to check if it works with parameters.
d

datajoely

02/16/2023, 2:25 PM
I have a feeling there may be a issue with that approach - but in my opinion it SHOULD be the way for users to do this, Would you mind checking?
v

Vassilis Kalofolias

02/16/2023, 2:26 PM
yeah, this use-case is really important for me if I want my data scientists to use the lib 🙂 I'll let you know how it goes (or ask more questions :P)
d

datajoely

02/16/2023, 2:26 PM
👍
v

Vassilis Kalofolias

02/16/2023, 4:35 PM
So the easiest way for Jupyter is to reload kedro with a dict of extra_params, bypassing the magic: This fails to get
False
as a bool:
Copy code
%reload_kedro --params co2_base:0.000415,round_occupancy:False
So we can do this instead:
Copy code
from kedro.ipython import reload_kedro
reload_kedro(extra_params={"co2_base":0.000415, "round_occupancy": False})
This is a bit hacky (even though this is a public function so the magic has no benefit). I am now searching for a way to do it without a session. The problem is that the catalog has all levels of parameters, for example:
Copy code
> catalog.list()

[
    'dataset1',
    'dataset2',
    'dataset3',
    'parameters',
    'params:param1',
    'params:group1',
    'params:group1.param1',
    'params:group1.param2',
    'params:group2',
    'params:group2.param1',
    'params:group2.param2',
    'params:group2.param3'
]
So the tricky thing is to not overwrite params individually, especially for the nested ones. This nesting is done through
kedro.framework.context.context.KedroContext._get_feed_dict()
that automatically nests parameters to create a ready input for
add_feed_dict()
. Unfortunately this function is not public... I think
add_feed_dict()
should have an option like
recursive: bool=False
to have this behavior. This would allow for reproducing the behaviour of session in notebooks easily so that we reuse pipelines for prototyping more easily:
Copy code
catalog with datasets
2. catalog.add_feed_dict(nested_params_dict, recursive=True)

def test_pipeline():
    pipe = create_pipeline()
    runner = SequentialRunner()
    catalog = DataCatalog()
    params_dict = {...}
    catalog.add_feed_dict(params_dict, recursive=True)
    runner.run(pipe, catalog)
The alternative would be possible if
run
would allow for
extra_params
What is in your opinion the most "kedro" acceptable style? My goal is to make it easy for a junior data scientist to use notebooks with a ready mature kedro project.
d

datajoely

02/16/2023, 4:37 PM
so the annoying answer
is this will get better in the future
but today you need to hack about
both our Jupyter workflow and our Parameter consistency pieces are under development
v

Vassilis Kalofolias

02/16/2023, 4:39 PM
I am completely ok with that! I just want to make sure that the general kedro moto "don't set up dynamic runs" will not lead to making such things more difficult in the future.
d

datajoely

02/16/2023, 4:40 PM
well I think things will get better
v

Vassilis Kalofolias

02/16/2023, 4:40 PM
I am very happy to hear that :)) It is a great tool for data engineering right now, and onboarding also prototyping people would only make it more useful :))
Anyway, thanks a lot for the quick feedback! You are doing a great work.
(shall I right about the above findings in some discussion about future dev somewhere?)
d

datajoely

02/21/2023, 10:45 AM
Hello @Vassilis Kalofolias - the
False
issue from the CLI is now fixed in 0.18.5 if you’d like to upgrade
🙏🏼 1
6 Views