Hi all I m working on a Kedro pipeline using the `kedro vert Kedro #questions

Hi all, I’m working on a Kedro pipeline (using the...

Ankit K

06/02/2025, 3:19 PM

Hi all, I’m working on a Kedro pipeline (using the

kedro-vertexai

plugin, version

0.10.0

) where I need to track each pipeline run in a BigQuery table. We use a table_suffix (typically a date or unique run/session ID) to uniquely identify data and outputs for each pipeline run, ensuring that results from different runs do not overwrite each other and can be traced back to a specific execution. The challenge is that the kedro

session_id

KEDRO_CONFIG_RUN_ID

is not available at config load time, so early config logic (like setting a table_suffix) uses a date or placeholder value. This can cause inconsistencies, especially if nodes run on different days or the pipeline is resumed. (Currently pipeline takes ~2.5 days to run) We tried generating the table_suffix using the current date at config load time, but this led to issues: if a node runs on a different day or the pipeline is resumed, a new table_suffix is generated, causing inconsistencies and making it hard to track a single pipeline run. We also experimented with different Kedro hooks (such as before_pipeline_run and before_node_run) to set or propagate the run/session ID, but still faced challenges ensuring the value is available everywhere, including during config loading. What is the best practice in Kedro (with Vertex AI integration) for generating and propagating a unique run/session ID that is available everywhere (including config loading and all nodes), so that all tracking and table suffixes are consistent for a given run? Should this be set as an environment variable before Kedro starts, or is there a recommended hook or config loader pattern for this? Any advice or examples would be appreciated!

👀 1

Ankita Katiyar

06/03/2025, 11:00 AM

Hey Ankit, I’ll look into this

👍 1

Dmitry Sorokin

06/03/2025, 11:13 AM

In my opinion, relying on

session_id

won’t work in this case because if the Kedro project is executed via multiple Vertex AI tasks, each task will have its own session ID. This means the session ID won’t be consistent for the entire project run. Instead, it makes more sense to generate a unique

run_id

externally (e.g., in the orchestrator) and inject it into the

table_suffix

manually, ensuring consistency across the whole project run.

6 Views

Open in Slack

Previous Next