Hi everyone, Is it possible to pass a dictionary a...
# questions
e
Hi everyone, Is it possible to pass a dictionary as value for the
--params
argument in the command line? If so, how should I do? I tried like:
kedro run --pipeline=inference --params=job="{'job_id':'98727','timezone':'-8.0'},source=mwb,version=1"
, but I got a ParserError...
Copy code
ParserError: while parsing a flow mapping
  in "<unicode string>", line 1, column 1:
    {'job_id':'98727'
    ^
expected ',' or '}', but got '<stream end>'
  in "<unicode string>", line 1, column 18:
    {'job_id':'98727'
d
So we have a —config option so you can pass in a YAML file with nested config like you suggest here
n
You can either use the dot syntax or use the config option.
👍 2
i
Oh wow I had not seen this option!! Is it a relatively recent addition?
d
the
--config
piece?
i
yeah
d
it’s actually been in there a while
but we don’t do a great job of advertising it
m
It’s the ONLY option so far that accepts lists/dicts 🙈
👀 1
d
and IIRC there is some inconsistency on how overrides work between config/cli
can you do omegaconf resolving for CLI parsing?
i
Copy code
@click.option(
    "--config",
    "-c",
    type=click.Path(exists=True, dir_okay=False, resolve_path=True),
    help=CONFIG_FILE_HELP,
    callback=_config_file_callback,
)
@click.option("--params", type=str, default="", help=PARAMS_ARG_HELP, callback=_split_params)
def run(
    tag,
    env,
    parallel,
    runner,
    is_async,
    node_names,
    to_nodes,
    from_nodes,
    from_inputs,
    to_outputs,
    load_version,
    pipeline,
    config,
    params,
):
    """Run the pipeline."""
    if parallel and runner:
        raise KedroCliError(
            "Both --parallel and --runner options cannot be used together. " "Please use either --parallel or --runner."
        )
    runner = runner or "SequentialRunner"
    if parallel:
        runner = "ParallelRunner"
    runner_class = load_obj(runner, "kedro.runner")

    tag = _get_values_as_tuple(tag) if tag else tag
    node_names = _get_values_as_tuple(node_names) if node_names else node_names

    package_name = str(Path(__file__).resolve().parent.name)
    with KedroSession.create(package_name, env=env, extra_params=params) as session:
        session.run(
            tags=tag,
            runner=runner_class(is_async=is_async),
            node_names=node_names,
            from_nodes=from_nodes,
            to_nodes=to_nodes,
            from_inputs=from_inputs,
            to_outputs=to_outputs,
            load_versions=load_version,
            pipeline_name=pipeline,
        )
This is the run function in my
cli.py
, it looks like we're somehow discarding that config, right?
m
_config_file_callback
updates the context
i
Copy code
def _config_file_callback(ctx, param, value):  # pylint: disable=unused-argument
    """Config file callback, that replaces command line options with config file
    values. If command line options are passed, they override config file values.
    """
    # for performance reasons
    import anyconfig  # pylint: disable=import-outside-toplevel

    ctx.default_map = ctx.default_map or {}
    section = ctx.info_name

    if value:
        config = anyconfig.load(value)[section]
        ctx.default_map.update(config)

    return value
I gotta read up on click, I don't fully understand this snippet. thanks for the pointer marrrcin 🙂
👍 1
d
this should be smoother when you’re off 0.17.x @Iñigo Hidalgo!
i
soon ™️
🚀 1
n
@marrrcin is right, the callback will update the parameters automatically,
--params
will still takes the priority if both defined. I used to use both and
--config
for some plugin metadata, and
--params
for Kedro’s parameter
i
@Nok Lam Chan this is actually a nice workaround for the --params not playing well with dictionaries (in my version of kedro, i know it was fixed later)
e
So, I would have to add something like this in the config.yml?
Copy code
run:
  tags: tag1, tag2, tag3
  pipeline: pipeline1
  parallel: true
  nodes_names: node1, node2
  env: env1
  params: job="{'job_id':'98727','timezone':'-8.0'}"
d
Copy code
params: 
    job: 
        job_id': '98727'
        timezone': '-8.0'
that is if you want the two values as strings not ints, floats
e
Is there any other configuration besides pass the config.yml file? It is not recognizing the params in the DataCalog during run time. I get:
ValueError: Pipeline input(s) {'job'} not found in the DataCatalog
m
You cannot have params in datacatalog
d
ah yes they’re two different things
e
Got it! All params should be in the config.yml. I think my problem appears to be more intricate than initially anticipated.. I want to execute pipelines every time there is a request from an API (fast-api). A service will request an inference and I have to respond quickly. So, I have to pass the request to the pipeline, execute it and respond. 1. Is there a mechanism to maintain the openness of the Kedro context, enabling me to receive and respond to requests without the need to reload the session for each request? 2. pass the parameters at running time to the pipeline from the given request from the service