Vishal Pandey
09/15/2024, 7:39 PMpipeline_id, name , description
whenever the pipeline is executed for the first time in production. This simply requires creating a db connection and running an insert query. I would like to better understand the below things -
1. Where can I store pipeline specific metadata in the kedro project . Let's say we have 3 pipelines defined in a project data_extraction , data_processing , model_training .
2. How can we read all these metadata followed by creating a db connection and finally executing the insert operation.
3. Last, What is the best place to achieve such tasks in kedro project. Is it Hooks ? Like We can run this logic after the context is created .Laurens Vijnck
09/16/2024, 7:43 AMafter_context_created
hook in which you could implement the logic.
Question is, should you? It seems like you'd like tracking across pipeline runs? Have you considered the MLFlow plugin? Seems this could provide a good starting point.
https://github.com/Galileo-Galilei/kedro-mlflowVishal Pandey
09/16/2024, 8:14 AMpipeline_id - PK
pipeline_name -
description
created_date
created_by
run_details
run_id - PK
triggered_at
time_taken
triggered_by
Vishal Pandey
09/16/2024, 8:15 AMLaurens Vijnck
09/16/2024, 8:25 AMLaurens Vijnck
09/16/2024, 8:29 AMLaurens Vijnck
09/16/2024, 8:30 AMkedro run
invocationVishal Pandey
09/16/2024, 11:20 AMVishal Pandey
09/16/2024, 11:25 AMLaurens Vijnck
09/16/2024, 11:41 AMLaurens Vijnck
09/16/2024, 11:41 AMVishal Pandey
09/16/2024, 11:47 AMLaurens Vijnck
09/16/2024, 11:49 AMLaurens Vijnck
09/16/2024, 11:52 AMVishal Pandey
09/16/2024, 11:54 AMLaurens Vijnck
09/16/2024, 11:54 AMVishal Pandey
09/16/2024, 12:01 PM1. Kedro-kubeflow init
-> This creates a config file as mentioned in this link - https://kedro-kubeflow.readthedocs.io/en/0.7.4/source/02_installation/02_configuration.html
2. kedro-kubeflow upload_pipeline
uses the above generated config and converts the kedro DAG into Kubeflow Compatible DAG and publishes the pipeline on kubeflow.
If you carefully see, there are run related configs present in the config file being generated. Any heads up from here as you already know our use case.Vishal Pandey
09/16/2024, 12:07 PMkedro-kubeflow run
, I am not sure how can we get unique run_ids .Laurens Vijnck
09/16/2024, 12:08 PMLaurens Vijnck
09/16/2024, 12:09 PMLaurens Vijnck
09/16/2024, 12:09 PMKFP_RUN_ID
which is set in the env variableLaurens Vijnck
09/16/2024, 12:09 PMVishal Pandey
09/16/2024, 12:10 PMVishal Pandey
09/16/2024, 12:10 PMVishal Pandey
09/16/2024, 12:11 PMos.environ.get(“KFP_RUN_ID”)
should do the job ?Laurens Vijnck
09/16/2024, 12:37 PMLaurens Vijnck
09/16/2024, 12:37 PMLaurens Vijnck
09/16/2024, 12:37 PMLaurens Vijnck
09/16/2024, 12:37 PMLaurens Vijnck
09/16/2024, 12:37 PM${oc.env:KFP_RUN_ID}
Laurens Vijnck
09/16/2024, 12:37 PMVishal Pandey
09/16/2024, 12:38 PMLaurens Vijnck
09/16/2024, 12:38 PMLaurens Vijnck
09/16/2024, 12:38 PMLaurens Vijnck
09/16/2024, 12:38 PMLaurens Vijnck
09/16/2024, 12:38 PMLaurens Vijnck
09/16/2024, 12:38 PMLaurens Vijnck
09/16/2024, 12:38 PMLaurens Vijnck
09/16/2024, 12:38 PMVishal Pandey
09/16/2024, 12:40 PMLaurens Vijnck
09/16/2024, 12:41 PMLaurens Vijnck
09/16/2024, 12:41 PMLaurens Vijnck
09/16/2024, 12:41 PMVishal Pandey
09/16/2024, 12:43 PMVishal Pandey
09/17/2024, 5:39 PMVishal Pandey
09/19/2024, 2:38 PMoc.env
to access this run_id in the kedro project , and probably set in the globals.yml and then we can re use it anywhere needed like in parameters.yml or catalog.yml or we can directly use oc.env wherever it is permitted across the files in conf/