Ankit K
06/02/2025, 3:19 PMkedro-vertexai
plugin, version 0.10.0
) where I need to track each pipeline run in a BigQuery table. We use a table_suffix (typically a date or unique run/session ID) to uniquely identify data and outputs for each pipeline run, ensuring that results from different runs do not overwrite each other and can be traced back to a specific execution.
The challenge is that the kedro session_id
or KEDRO_CONFIG_RUN_ID
is not available at config load time, so early config logic (like setting a table_suffix) uses a date or placeholder value. This can cause inconsistencies, especially if nodes run on different days or the pipeline is resumed. (Currently pipeline takes ~2.5 days to run)
We tried generating the table_suffix using the current date at config load time, but this led to issues: if a node runs on a different day or the pipeline is resumed, a new table_suffix is generated, causing inconsistencies and making it hard to track a single pipeline run.
We also experimented with different Kedro hooks (such as before_pipeline_run and before_node_run) to set or propagate the run/session ID, but still faced challenges ensuring the value is available everywhere, including during config loading.
What is the best practice in Kedro (with Vertex AI integration) for generating and propagating a unique run/session ID that is available everywhere (including config loading and all nodes), so that all tracking and table suffixes are consistent for a given run?
Should this be set as an environment variable before Kedro starts, or is there a recommended hook or config loader pattern for this?
Any advice or examples would be appreciated!Ankita Katiyar
06/03/2025, 11:00 AMDmitry Sorokin
06/03/2025, 11:13 AMsession_id
won’t work in this case because if the Kedro project is executed via multiple Vertex AI tasks, each task will have its own session ID. This means the session ID won’t be consistent for the entire project run. Instead, it makes more sense to generate a unique run_id
externally (e.g., in the orchestrator) and inject it into the table_suffix
manually, ensuring consistency across the whole project run.