https://kedro.org/ logo
#questions
Title
# questions
e

Elaine Resende

11/20/2023, 7:41 PM
Hi everyone, Is it possible to pass a dictionary as value for the
--params
argument in the command line? If so, how should I do? I tried like:
kedro run --pipeline=inference --params=job="{'job_id':'98727','timezone':'-8.0'},source=mwb,version=1"
, but I got a ParserError...
Copy code
ParserError: while parsing a flow mapping
  in "<unicode string>", line 1, column 1:
    {'job_id':'98727'
    ^
expected ',' or '}', but got '<stream end>'
  in "<unicode string>", line 1, column 18:
    {'job_id':'98727'
d

datajoely

11/20/2023, 8:47 PM
So we have a —config option so you can pass in a YAML file with nested config like you suggest here
n

Nok Lam Chan

11/21/2023, 2:34 AM
You can either use the dot syntax or use the config option.
👍 2
i

Iñigo Hidalgo

11/21/2023, 10:44 AM
Oh wow I had not seen this option!! Is it a relatively recent addition?
d

datajoely

11/21/2023, 10:44 AM
the
--config
piece?
i

Iñigo Hidalgo

11/21/2023, 10:44 AM
yeah
d

datajoely

11/21/2023, 10:44 AM
it’s actually been in there a while
but we don’t do a great job of advertising it
m

marrrcin

11/21/2023, 10:45 AM
It’s the ONLY option so far that accepts lists/dicts 🙈
👀 1
d

datajoely

11/21/2023, 10:45 AM
and IIRC there is some inconsistency on how overrides work between config/cli
can you do omegaconf resolving for CLI parsing?
i

Iñigo Hidalgo

11/21/2023, 10:48 AM
Copy code
@click.option(
    "--config",
    "-c",
    type=click.Path(exists=True, dir_okay=False, resolve_path=True),
    help=CONFIG_FILE_HELP,
    callback=_config_file_callback,
)
@click.option("--params", type=str, default="", help=PARAMS_ARG_HELP, callback=_split_params)
def run(
    tag,
    env,
    parallel,
    runner,
    is_async,
    node_names,
    to_nodes,
    from_nodes,
    from_inputs,
    to_outputs,
    load_version,
    pipeline,
    config,
    params,
):
    """Run the pipeline."""
    if parallel and runner:
        raise KedroCliError(
            "Both --parallel and --runner options cannot be used together. " "Please use either --parallel or --runner."
        )
    runner = runner or "SequentialRunner"
    if parallel:
        runner = "ParallelRunner"
    runner_class = load_obj(runner, "kedro.runner")

    tag = _get_values_as_tuple(tag) if tag else tag
    node_names = _get_values_as_tuple(node_names) if node_names else node_names

    package_name = str(Path(__file__).resolve().parent.name)
    with KedroSession.create(package_name, env=env, extra_params=params) as session:
        session.run(
            tags=tag,
            runner=runner_class(is_async=is_async),
            node_names=node_names,
            from_nodes=from_nodes,
            to_nodes=to_nodes,
            from_inputs=from_inputs,
            to_outputs=to_outputs,
            load_versions=load_version,
            pipeline_name=pipeline,
        )
This is the run function in my
cli.py
, it looks like we're somehow discarding that config, right?
m

marrrcin

11/21/2023, 10:54 AM
_config_file_callback
updates the context
i

Iñigo Hidalgo

11/21/2023, 10:56 AM
Copy code
def _config_file_callback(ctx, param, value):  # pylint: disable=unused-argument
    """Config file callback, that replaces command line options with config file
    values. If command line options are passed, they override config file values.
    """
    # for performance reasons
    import anyconfig  # pylint: disable=import-outside-toplevel

    ctx.default_map = ctx.default_map or {}
    section = ctx.info_name

    if value:
        config = anyconfig.load(value)[section]
        ctx.default_map.update(config)

    return value
I gotta read up on click, I don't fully understand this snippet. thanks for the pointer marrrcin 🙂
👍 1
d

datajoely

11/21/2023, 10:57 AM
this should be smoother when you’re off 0.17.x @Iñigo Hidalgo!
i

Iñigo Hidalgo

11/21/2023, 10:59 AM
soon ™️
🚀 1
n

Nok Lam Chan

11/21/2023, 11:05 AM
@marrrcin is right, the callback will update the parameters automatically,
--params
will still takes the priority if both defined. I used to use both and
--config
for some plugin metadata, and
--params
for Kedro’s parameter
i

Iñigo Hidalgo

11/21/2023, 11:16 AM
@Nok Lam Chan this is actually a nice workaround for the --params not playing well with dictionaries (in my version of kedro, i know it was fixed later)
e

Elaine Resende

11/21/2023, 12:41 PM
So, I would have to add something like this in the config.yml?
Copy code
run:
  tags: tag1, tag2, tag3
  pipeline: pipeline1
  parallel: true
  nodes_names: node1, node2
  env: env1
  params: job="{'job_id':'98727','timezone':'-8.0'}"
d

datajoely

11/21/2023, 12:44 PM
Copy code
params: 
    job: 
        job_id': '98727'
        timezone': '-8.0'
that is if you want the two values as strings not ints, floats
e

Elaine Resende

11/21/2023, 1:17 PM
Is there any other configuration besides pass the config.yml file? It is not recognizing the params in the DataCalog during run time. I get:
ValueError: Pipeline input(s) {'job'} not found in the DataCatalog
m

marrrcin

11/21/2023, 1:27 PM
You cannot have params in datacatalog
d

datajoely

11/21/2023, 1:28 PM
ah yes they’re two different things
e

Elaine Resende

11/21/2023, 2:16 PM
Got it! All params should be in the config.yml. I think my problem appears to be more intricate than initially anticipated.. I want to execute pipelines every time there is a request from an API (fast-api). A service will request an inference and I have to respond quickly. So, I have to pass the request to the pipeline, execute it and respond. 1. Is there a mechanism to maintain the openness of the Kedro context, enabling me to receive and respond to requests without the need to reload the session for each request? 2. pass the parameters at running time to the pipeline from the given request from the service