Priyanka Patil
08/29/2023, 7:47 PMLodewic van Twillert
08/29/2023, 8:12 PM*if_exists*{'fail', 'replace', 'append'}, default 'fail'
So out of the box, afaik there is no upsert functionality.
Some issues I can think of why that is:
1. How to upsert exactly depends on the database you are using
2. Upsert needs to be aware of what index you are matching against
But you can definitely create a subclass of `SQLTableDataSet`and overwrite the _save()
method with your own upsert method!
What database do you use? Do you use SQLAlchemy? And, do you already have a method implementing upserting data somehow or not yet?Priyanka Patil
08/29/2023, 8:19 PMLodewic van Twillert
08/29/2023, 8:23 PMpandas.SQLTableDataSet
and use your upsert implementation in the _save()
method
-- One reason this might not be entirely conform Kedro principles is that the output of your pipeline depends on the state of your database, therefore the data pipeline is not 100% reproducible 🤔Priyanka Patil
08/29/2023, 8:30 PMdatajoely
08/30/2023, 9:18 AMIbisDataSet
which may be a nicer modern way of doing this
https://github.com/inigohidalgo/kedro-ibis-datasetIñigo Hidalgo
08/30/2023, 1:24 PMCody Peterson
08/30/2023, 2:32 PM.sql
or .raw_sql
(the latter if no records are returned) to achieve whatever behavior you could in SQL