Hi all We re planning to enable our app users to execute Ked Kedro #questions

Hi all, We're planning to enable our app users to...

Anton Nikishin

02/27/2024, 7:46 AM

Hi all, We're planning to enable our app users to execute Kedro pipeline using their own parameters. We anticipate having around 20 users. The issue we're facing concerns the organization of this feature: if multiple users run the same pipeline concurrently, there's a risk they might overwrite each other's output files. What are the best practices for managing this situation?

Juan Luis

02/27/2024, 9:28 AM

hi Anton, just yesterday we had a conversation with another group of users that had a similar issue. Kedro does not provide any guarantees around executing the same pipeline concurrently. your best bet is making sure that the outputs of every

kedro run

are versioned, either by adding

versioned: true

in your catalog or by using some external versioning solution (delta tables, MLFlow registries etc) does this make sense?

Nok Lam Chan

02/27/2024, 2:09 PM

In most project, each user with have their own protected space to work, think of a

s3

bucket, or a folder

Nok Lam Chan

02/27/2024, 2:10 PM

so the path to your file will be

<s3://common_bucket/${user_name}/some_data/some_parquet.pq>

this 1

Anton Nikishin

02/27/2024, 3:07 PM

Very helpful, thank you

👍🏼 1

Matthias Roels

02/27/2024, 4:12 PM

Indeed, we do something similar and we generate a unique identifier for every run that we use in the URI of each output dataset. On the app side, we have a table that maps user to ID (and datestamp) so that we can retrieve a user’s data for the correct run.

👀 1

Nok Lam Chan

02/27/2024, 4:32 PM

@Matthias Roels Is there anything that you wish Kedro can do for you? It is difficult to manage that mapping?

Matthias Roels

02/28/2024, 7:14 PM

Well that mapping is managed in the backend of our web app (not in kedro). We basically build a UI where users can create a parameters.yaml (but in a nicer way). Then they can submit a pipeline with those params. Under the hood, a records is created in the app’s backend database and a workflow is triggered (using Argo events webhook mechanism) where the params are extracted from the body of the http request and mounted in the

KEDRO_ENV

conf folder

Matthias Roels

02/28/2024, 7:16 PM

The not-so nice part is if you want to run the pipeline for testing (during development). Then you need to manually download the config from the UI first. It’s easy but an extra step you have to do if you want a realistic config.

Anton Nikishin

02/29/2024, 3:08 PM

@Nok Lam Chan, I think I need a bit more detail on the approach you suggested. Am I understanding correctly that in the Kedro catalog, I could add

{$user_name}

to each file path, and then pass

user_name

as an argument during each run for Kedro? How do I pass this argument?

K 1

Nok Lam Chan

02/29/2024, 3:43 PM

I think most likely you would want to keep it as a

global

variable or environment variable. Optionally you can also just use the machine name if that's enough

Anton Nikishin

03/13/2024, 10:57 AM

Could I pass

global

variable to session.run? Below is the code that I'm using to pass

params

. How to modify it to pass globals also?

Copy code

def run_kedro_pipeline(scenario_config: dict, project_path: str):
    """
    Run the Kedro pipeline with the given scenario configuration.

    Args:
        scenario_config (dict): The scenario configuration
        project_path (str): path to Kedro project
    """
    # Connect to the Kedro project
    bootstrap_project(project_path)

    with KedroSession.create(
        project_path=project_path, extra_params=scenario_config
    ) as session:
        # Run the Kedro pipeline
        session.run(pipeline_name="reporting_pipeline")

If it's impossible to do it with this way of running a pipeline, what would be a better way?

Nok Lam Chan

03/13/2024, 11:04 AM

You should use

runtime_params

which match the semantic

Nok Lam Chan

03/13/2024, 11:08 AM

Cc @Ankita Katiyar , do we have some documentation about how to use

runtime_params

and

globals

and the explanation why we don't allow overriding globals with runtime? https://docs.kedro.org/en/latest/configuration/advanced_configuration.html

Nok Lam Chan

03/13/2024, 11:09 AM

You may find something in this thread: https://kedro-org.slack.com/archives/C03RKP2LW64/p1709821665693559

Anton Nikishin

03/13/2024, 11:13 AM

Then how to pass

runtime_params

? I understand how to do that with CLI but don't see how to do it with KesroSession create/run.

Ankita Katiyar

03/13/2024, 11:15 AM

globals

are always read from

globals.yml

or whatever config patterns you define in

CONFIG_LOADER_ARGS

. The

extra_params

are essentially the

runtime_params

here

Ankita Katiyar

03/13/2024, 11:18 AM

There’s discussion on how

globals

and

runtime_params

should interact with each other on the issue here - https://github.com/kedro-org/kedro/issues/2531 but I don’t think we put it in the documentation @Nok Lam Chan

Anton Nikishin

03/13/2024, 11:20 AM

Friends, you are the best as always. Thank you!

❤️ 1

K 1

Nok Lam Chan

03/13/2024, 11:42 AM

The
extra_params
are essentially the
runtime_params
here

Yup, this is the source of confusion I guess. We want to rename it long ago but this would result in breaking change, sorry that we cannot make this more obivous

Nok Lam Chan

03/13/2024, 11:42 AM

@Ankita Katiyar i will open an issue for this (updated: https://github.com/kedro-org/kedro/issues/3706)

👍 2

Yolan Honoré-Rougé

03/13/2024, 2:55 PM

And just to elaborate on the kedro-boot part, this is designed specifically for this type of use case. If performance matters, check out kedro-boot!

👍 1

Open in Slack

Previous Next