Hi all, We're planning to enable our app users to...
# questions
a
Hi all, We're planning to enable our app users to execute Kedro pipeline using their own parameters. We anticipate having around 20 users. The issue we're facing concerns the organization of this feature: if multiple users run the same pipeline concurrently, there's a risk they might overwrite each other's output files. What are the best practices for managing this situation?
j
hi Anton, just yesterday we had a conversation with another group of users that had a similar issue. Kedro does not provide any guarantees around executing the same pipeline concurrently. your best bet is making sure that the outputs of every
kedro run
are versioned, either by adding
versioned: true
in your catalog or by using some external versioning solution (delta tables, MLFlow registries etc) does this make sense?
n
In most project, each user with have their own protected space to work, think of a
s3
bucket, or a folder
so the path to your file will be
<s3://common_bucket/${user_name}/some_data/some_parquet.pq>
this 1
a
Very helpful, thank you
👍🏼 1
m
Indeed, we do something similar and we generate a unique identifier for every run that we use in the URI of each output dataset. On the app side, we have a table that maps user to ID (and datestamp) so that we can retrieve a user’s data for the correct run.
👀 1
n
@Matthias Roels Is there anything that you wish Kedro can do for you? It is difficult to manage that mapping?
m
Well that mapping is managed in the backend of our web app (not in kedro). We basically build a UI where users can create a parameters.yaml (but in a nicer way). Then they can submit a pipeline with those params. Under the hood, a records is created in the app’s backend database and a workflow is triggered (using Argo events webhook mechanism) where the params are extracted from the body of the http request and mounted in the
KEDRO_ENV
conf folder
The not-so nice part is if you want to run the pipeline for testing (during development). Then you need to manually download the config from the UI first. It’s easy but an extra step you have to do if you want a realistic config.
a
@Nok Lam Chan, I think I need a bit more detail on the approach you suggested. Am I understanding correctly that in the Kedro catalog, I could add
{$user_name}
to each file path, and then pass
user_name
as an argument during each run for Kedro? How do I pass this argument?
K 1
n
I think most likely you would want to keep it as a
global
variable or environment variable. Optionally you can also just use the machine name if that's enough
a
Could I pass
global
variable to session.run? Below is the code that I'm using to pass
params
. How to modify it to pass globals also?
Copy code
def run_kedro_pipeline(scenario_config: dict, project_path: str):
    """
    Run the Kedro pipeline with the given scenario configuration.

    Args:
        scenario_config (dict): The scenario configuration
        project_path (str): path to Kedro project
    """
    # Connect to the Kedro project
    bootstrap_project(project_path)

    with KedroSession.create(
        project_path=project_path, extra_params=scenario_config
    ) as session:
        # Run the Kedro pipeline
        session.run(pipeline_name="reporting_pipeline")
If it's impossible to do it with this way of running a pipeline, what would be a better way?
n
You should use
runtime_params
which match the semantic
Cc @Ankita Katiyar , do we have some documentation about how to use
runtime_params
and
globals
and the explanation why we don't allow overriding globals with runtime? https://docs.kedro.org/en/latest/configuration/advanced_configuration.html
a
Then how to pass
runtime_params
? I understand how to do that with CLI but don't see how to do it with KesroSession create/run.
a
globals
are always read from
globals.yml
or whatever config patterns you define in
CONFIG_LOADER_ARGS
. The
extra_params
are essentially the
runtime_params
here
There’s discussion on how
globals
and
runtime_params
should interact with each other on the issue here - https://github.com/kedro-org/kedro/issues/2531 but I don’t think we put it in the documentation @Nok Lam Chan
a
Friends, you are the best as always. Thank you!
❤️ 1
K 1
n
The
extra_params
are essentially the
runtime_params
here
Yup, this is the source of confusion I guess. We want to rename it long ago but this would result in breaking change, sorry that we cannot make this more obivous
@Ankita Katiyar i will open an issue for this (updated: https://github.com/kedro-org/kedro/issues/3706)
👍 2
y
And just to elaborate on the kedro-boot part, this is designed specifically for this type of use case. If performance matters, check out kedro-boot!
👍 1