Hello all, I have a question about Kedro Boot, is ...
# questions
c
Hello all, I have a question about Kedro Boot, is there a way to override a global parameter (globals.yml) when running the kedro boot session ? Thanks
n
You can do nested resolver, the idea is if runtime parameters is available then use it instead of the global one.
${runtime_params: xxx, ${globals: xxx}}
plusone 1
Or maybe you can use {$runtime_params} directly in
globals.yml
? Not so sure about this, Cc @Ankita Katiyar
c
My problem is more related to Kedro Boot, as far as I know, you cannot send a globals during the
Copy code
get_kedro_boot_session("project").run(
    parameters={...},
    }
)
I have a global variable that i use in my catalog Edit : I understood your message, i will try it and get back to you, thanks
n
I think this can be solved with just pure Kedro, though I am not sure if this is kedro-boot specific.
t
Hello Clement ! In the Kedro Boot point of vue, there is a startup time (Kedro Session run) where
globals
are resolved and an iteration time ( Kedro Boot session run) when
itertime_params
are resolved. You can try using
itertime_params
instead of
paramaters
, if you want to inject a parameter at each run iteration. Here is how to declare it in your catalog :
Copy code
your_dataset:
  type: …
  Filepath: …${itertime_params:<param_name>}
Then you can run the kedro boot session with:
Copy code
session.run(itertime_params={‘param_name’: param_value})
Otherwise, can you elaborate more on your use case to see if Kedro Boot is a good fit
👀 1
c
Hi, thanks for you reply. I am not sure i understand the difference between parameters and itertime_params in Kedro Boot. Also, I used itertime_params in my Kedro Boot session and it doesn't seem to be recognized by the resolver : My catalog : path: ...-${itertime_params:airline} My session run : (airline is from my streamlit app)
Copy code
if submitted:
    outputs = get_kedro_boot_session("project").run(
        itertime_params={
            "airline": airline,
            ...
        }
    )
kedro.io.core.DatasetError: Port could not be cast to integer value as 'airline,None}'. Failed to instantiate dataset 'fdr' of type 'kedro_datasets.partitions.partitioned_dataset.PartitionedDataset'. Thanks for your help
To explain more about my use case, it is a simple streamlit app that I want to connect to my kedro pipeline. I wish to select an airline from streamlit and pass it to my Kedro Pipelines and Catalog (this will define where i fetch my buckets). The airline parameter is used in my catalog, my parameters and also credentials so I have made a globals.yml Here is for example a dataset :
Copy code
fdr:
  type: partitions.PartitionedDataset
  path: <gs://oa-sbfe-generated-fdr>-${runtime_params:airline}
  dataset:
    type: fdr_explorer_streamlit.datasets.fdr_dataset.FdrDataset
    load_args:
      low_memory: False
And here is my simple streamlit app :
Copy code
from datetime import datetime, timedelta
from pathlib import Path

import streamlit as st
from kedro_boot.app.booter import boot_package, boot_project
from ydata_profiling import ProfileReport

st.set_page_config(layout="wide")


def get_kedro_boot_session(boot_type):
    if boot_type == "project":
        return boot_project(project_path=Path.cwd(), kedro_args={"pipeline": "__default__"})
    else:
        return boot_package(
            package_name="fdr_explorer_streamlit",
            kedro_args={"pipeline": "__default__", "conf_source": "conf"},
        )


# get_kedro_boot_session("project")

with st.container():
    with st.form("Parameters"):
        airline = st.text_input("Choose an airline", placeholder="iad")
        recorded_flight_id = st.text_input("Chose a recorded flight id")
        date_min, date_max = datetime.now().date() - timedelta(days=365), datetime.now().date()
        date_range = st.slider("Choose a date range", date_min, date_max, (date_min, date_max))

        submitted = st.form_submit_button("Explore FDRs")

if submitted:
    # session.run(namespace="download")
    outputs = get_kedro_boot_session("project").run(
        parameters={
            "airline": airline,
            "metadatas_parameters": {
                "recorded_flight_id": recorded_flight_id,
                "date_range": date_range,
            },
        }
    )
    report = ProfileReport(
        outputs.get("processed_fdr").reset_index(),
        title="FDR profiling",
        minimal=True,
    ).to_html()
    st.components.v1.html(report, height=1020, scrolling=True)
In the case of runtime_params, i have this error omegaconf.errors.InterpolationResolutionError: Runtime parameter 'airline' not found and no default value provided. full_key: fdr.path object_type=dict
t
I think that the error that you had with
itertime_params
is related to the uderlying DataSet. But anyways, neither
itertime_params
nor the
parameters
would help with the need of passing the
airline
param to the credentials. The only current way of providing credentials to your kedro pipeline is to create a session for each iteration. You'll end up with :
Copy code
if submitted:
    os.environ["your_creds_entry1"] = creds_entry1
    os.environ["your_creds_entry2"] = creds_entry2
    ..    
    session = boot_project(project_path=Path.cwd(), kedro_args={"pipeline": "__default__", "params": {"airline": airline, ....})
    session.run()
In your
catalog.yml
and
parameter.yml
use
${runtime_params:airline}
In your
credentials.yml
use the oc.env resolver