Question about the newly released `kedro boot` One of my wor Kedro #plugins-integrations

Question about the newly released `kedro-boot`: On...

Hugo Evers

11/08/2023, 3:13 PM

Question about the newly released `kedro-boot`: One of my workflow relies on serving the result of a pipeline given an input-parameter to a dataset. specifically, the api gets a list of ids, and uses these in a query to a database for filtering. the query is executed in a dataset, but these ids are only known at runtime. now i can add a dataset, set these ids to some preset and then overwrite them using omegaconfigloader, or templatedconfigloader and pass extra_params to the kedrosession. However, since

kedro-boot

allows one to override pipeline inputs dynamically, maybe its also possible to overwrite arguments passed to datasets? im specifically asking because

kedro-boot

seems to be the answer to using

kedro

with something like

FastApi

, but

FastApi

is routinely used for CRUD on a database, and kedro has this nice mechanism for handling credentials and such. So actually passing the credentials into a node to access a database is quite ugly.

👍 1

Merel

11/08/2023, 3:19 PM

@Takieddine Kadiri / @Yolan Honoré-Rougé

👍 1

Hugo Evers

11/08/2023, 3:48 PM

I do see references to catalog templates/views in the tests and source code, so maybe that could be a route to achieve this?

👍 1

Takieddine Kadiri

11/08/2023, 4:16 PM

Hello Hugo !

kedro-boot

introduce a new kind of parmeters called

template params

that are resolved at each run iteration to cover exactly your use case. You can leverage SqlQueryDataset. Let's say you have this dataset

Copy code

your_dataset:
  type: pandas.SQLQueryDataSet
  sql: SELECT * from TABLE WHERE SOME_COLUMN=[[ column_value ]]

Note the

column_value

is now a

template param

that will be resolved at iteration time.

template params

are defined with [[ ]] Jinja template Then in you fastapi code, you can render the

column_value

template param with a fastapi path or query parameters. You'll have something like

Copy code

@app.get("/your_endpoint/{your_parameter}")
def your_endpoint(your_parameter),
	return kedro_boot_session.run(name=<your_pipeline_view>, template_params={"column_value": your_parameter})

You can adapt this to you exact case. You can also make you sql dataset more secure by using parametrized queries instead of injecting direcly the

column_value

from the web. This let you leverage kedro's amazing capabilities for handling backend IO, business logic and even application lifecycle (if you opt for the embeded mode) while using a full fastapi app that handle the controller and the serving part. Hope this helps simple smile

👍 2

Hugo Evers

11/09/2023, 2:15 PM

nice, thanks! One follow-up question, lets say the result of a node is needed as a template_params input. Do you need need to make sure those are ran as separate pipelines or does the template_params also allows one to connect outputs to template_params?

Takieddine Kadiri

11/09/2023, 2:56 PM

If i understand correctly your question, your http request does not contains directly the parameter needed to render the template param, you want to run a node/pipeline and use it's output as a parameter to render a template param of an input dataset of another pipeline ? If it's the case you can use two pipeline views in yout kedro apps. Your fastapi app :

Copy code

@app.get("/your_endpoint/{your_parameter}")
def your_endpoint(your_parameter),
	outputs_data = kedro_boot_session.run(name="your_first_pipeline_view")
	return kedro_boot_session.run(name="your_second_pipeline_view", template_params={"column_value": outputs_data})

Your pipeline_registry.py:

Copy code

from kedro.pipeline.modular_pipeline import pipeline
from kedro_boot.pipeline import app_pipeline

your_first_pipeline = pipeline([node(your_function, inputs="your_inputs", outputs="your_output")])
your_second_pipeline = pipeline(.....)

app_first_pipeline = app_pipeline(
        inference_pipeline,
        name="your_first_pipeline_view",
        outputs="your_output",
    )

app_second_pipeline = app_pipeline(
        your_second_pipeline,
        name="your_second_pipeline_view",
    )

return {"__default__": app_first_pipeline + app_second_pipeline}

Correct me if i misunderstood the question, otherwise let me know if it works for you.

Hugo Evers

11/09/2023, 3:05 PM

yes, but it was a hypothetical question, since in my use case the request does contains the parameters. But in the future we might need to tackle that issue. Your answer is in line with my expectation, thanks!

👍 1

3 Views

Open in Slack

Previous Next