Hi everyone Is it possible to pass a dictionary as value for Kedro #questions

Hi everyone, Is it possible to pass a dictionary a...

Elaine Resende

11/20/2023, 7:41 PM

Hi everyone, Is it possible to pass a dictionary as value for the

--params

argument in the command line? If so, how should I do? I tried like:

kedro run --pipeline=inference --params=job="{'job_id':'98727','timezone':'-8.0'},source=mwb,version=1"

, but I got a ParserError...

Copy code

ParserError: while parsing a flow mapping
  in "<unicode string>", line 1, column 1:
    {'job_id':'98727'
    ^
expected ',' or '}', but got '<stream end>'
  in "<unicode string>", line 1, column 18:
    {'job_id':'98727'

datajoely

11/20/2023, 8:47 PM

So we have a —config option so you can pass in a YAML file with nested config like you suggest here

Nok Lam Chan

11/21/2023, 2:34 AM

You can either use the dot syntax or use the config option.

👍 2

marrrcin

11/21/2023, 8:10 AM

https://docs.kedro.org/en/0.18.14/nodes_and_pipelines/run_a_pipeline.html#configure-kedro-run-arguments

thankyou 1

Iñigo Hidalgo

11/21/2023, 10:44 AM

Oh wow I had not seen this option!! Is it a relatively recent addition?

datajoely

11/21/2023, 10:44 AM

the

--config

piece?

Iñigo Hidalgo

11/21/2023, 10:44 AM

yeah

datajoely

11/21/2023, 10:44 AM

it’s actually been in there a while

datajoely

11/21/2023, 10:44 AM

but we don’t do a great job of advertising it

marrrcin

11/21/2023, 10:45 AM

It’s the ONLY option so far that accepts lists/dicts 🙈

👀 1

datajoely

11/21/2023, 10:45 AM

and IIRC there is some inconsistency on how overrides work between config/cli

datajoely

11/21/2023, 10:45 AM

can you do omegaconf resolving for CLI parsing?

Iñigo Hidalgo

11/21/2023, 10:48 AM

Copy code

@click.option(
    "--config",
    "-c",
    type=click.Path(exists=True, dir_okay=False, resolve_path=True),
    help=CONFIG_FILE_HELP,
    callback=_config_file_callback,
)
@click.option("--params", type=str, default="", help=PARAMS_ARG_HELP, callback=_split_params)
def run(
    tag,
    env,
    parallel,
    runner,
    is_async,
    node_names,
    to_nodes,
    from_nodes,
    from_inputs,
    to_outputs,
    load_version,
    pipeline,
    config,
    params,
):
    """Run the pipeline."""
    if parallel and runner:
        raise KedroCliError(
            "Both --parallel and --runner options cannot be used together. " "Please use either --parallel or --runner."
        )
    runner = runner or "SequentialRunner"
    if parallel:
        runner = "ParallelRunner"
    runner_class = load_obj(runner, "kedro.runner")

    tag = _get_values_as_tuple(tag) if tag else tag
    node_names = _get_values_as_tuple(node_names) if node_names else node_names

    package_name = str(Path(__file__).resolve().parent.name)
    with KedroSession.create(package_name, env=env, extra_params=params) as session:
        session.run(
            tags=tag,
            runner=runner_class(is_async=is_async),
            node_names=node_names,
            from_nodes=from_nodes,
            to_nodes=to_nodes,
            from_inputs=from_inputs,
            to_outputs=to_outputs,
            load_versions=load_version,
            pipeline_name=pipeline,
        )

This is the run function in my

cli.py

, it looks like we're somehow discarding that config, right?

marrrcin

11/21/2023, 10:54 AM

_config_file_callback

updates the context

Iñigo Hidalgo

11/21/2023, 10:56 AM

Copy code

def _config_file_callback(ctx, param, value):  # pylint: disable=unused-argument
    """Config file callback, that replaces command line options with config file
    values. If command line options are passed, they override config file values.
    """
    # for performance reasons
    import anyconfig  # pylint: disable=import-outside-toplevel

    ctx.default_map = ctx.default_map or {}
    section = ctx.info_name

    if value:
        config = anyconfig.load(value)[section]
        ctx.default_map.update(config)

    return value

Iñigo Hidalgo

11/21/2023, 10:56 AM

I gotta read up on click, I don't fully understand this snippet. thanks for the pointer marrrcin 🙂

👍 1

datajoely

11/21/2023, 10:57 AM

this should be smoother when you’re off 0.17.x @Iñigo Hidalgo!

Iñigo Hidalgo

11/21/2023, 10:59 AM

soon ™️

🚀 1

Nok Lam Chan

11/21/2023, 11:05 AM

@marrrcin is right, the callback will update the parameters automatically,

--params

will still takes the priority if both defined. I used to use both and

--config

for some plugin metadata, and

--params

for Kedro’s parameter

Iñigo Hidalgo

11/21/2023, 11:16 AM

@Nok Lam Chan this is actually a nice workaround for the --params not playing well with dictionaries (in my version of kedro, i know it was fixed later)

Elaine Resende

11/21/2023, 12:41 PM

So, I would have to add something like this in the config.yml?

Copy code

run:
  tags: tag1, tag2, tag3
  pipeline: pipeline1
  parallel: true
  nodes_names: node1, node2
  env: env1
  params: job="{'job_id':'98727','timezone':'-8.0'}"

datajoely

11/21/2023, 12:44 PM

Copy code

params: 
    job: 
        job_id': '98727'
        timezone': '-8.0'

datajoely

11/21/2023, 12:47 PM

that is if you want the two values as strings not ints, floats

Elaine Resende

11/21/2023, 1:17 PM

Is there any other configuration besides pass the config.yml file? It is not recognizing the params in the DataCalog during run time. I get:

ValueError: Pipeline input(s) {'job'} not found in the DataCatalog

marrrcin

11/21/2023, 1:27 PM

You cannot have params in datacatalog

datajoely

11/21/2023, 1:28 PM

ah yes they’re two different things

Elaine Resende

11/21/2023, 2:16 PM

Got it! All params should be in the config.yml. I think my problem appears to be more intricate than initially anticipated.. I want to execute pipelines every time there is a request from an API (fast-api). A service will request an inference and I have to respond quickly. So, I have to pass the request to the pipeline, execute it and respond. 1. Is there a mechanism to maintain the openness of the Kedro context, enabling me to receive and respond to requests without the need to reload the session for each request? 2. pass the parameters at running time to the pipeline from the given request from the service

30 Views

Open in Slack

Previous Next