When in a non production environment during development data Kedro #questions

When in a non-production environment - during deve...

Michel van den Berg

07/08/2023, 6:57 AM

When in a non-production environment - during development - data engineers can validate datasets between nodes, to see if the function was correctly executed. However, I wonder if you have some guidance on how to do this in a production environment, especially when CI/CD is involved and when the pipeline is not run locally but in say Airflow. I am aware that there are data quality tools like Great Expectations that can automate this within a CI/CD pipeline. Some questions I have: • When an automated data quality tool fails the (Airflow) pipeline that is running in a CI/CD pipeline, what is the recommended way of fixing the data and re-running the pipeline again? Is it recommended to re-run the whole pipeline again, or can we also run only a subset of the pipeline? Or is it really hard to find out where the data quality issue resides in the overal (master) pipeline, thus it might be better to re-run the pipeline as a whole again? • I understand that it is better to automate data quality testing, however, is there also something like manual data quality testing, especially when running in the context of a production system? Can we express something like a manual validation step within Kedro , whereas the pipelines waits until a user presses a button to continue the pipeline, or - in case of an data error - (partially) re-run the pipeline with newly uploaded corrected data?

👀 1

Nok Lam Chan

07/09/2023, 7:45 AM

Both airflow and Kedro operate on a DAG, Kedro cli has different options to run a pipeline, as long as you know which dataset is failed, you only need to re-run from that point and Kedro can figure out the dependency for you.

Nok Lam Chan

07/09/2023, 7:48 AM

https://docs.kedro.org/en/stable/development/commands_reference.html#modifying-a-kedro-run Using the from-nodes option could help.

Nok Lam Chan

07/09/2023, 7:48 AM

https://kedro-org.slack.com/archives/C03RKAQ0MGQ/p1687426725047259 This blog may also give you some information.

4 Views

Open in Slack

Previous Next