Hey guys! I recently started using Kedro and reall...
# questions
m
Hey guys! I recently started using Kedro and really like it so far. I am struggling to connect pipelines based on SQL datasets. Basically I have a pipeline connecting multiple datasources and writing it into a Postgres db with a custom-dataset. Then I will have different pipelines loading data based on pandas.SQLQueryDataset. Each of those pipelines only uses a sub-set of the data in the database. Kedro does not connect these pipelines, since there is no connection between the dataset that writes into the db and the ones that read (see toy example below) How to tell it that “ftwue_db_multi_series” draws data from the same db as “ftwue_db” writes in?
d
So whilst we support SQL this way, because the execution happens outside of the Kedro DAG this feels awkward. You have to pass dummy datasets between the nodes to do this nicely. However! As of the last year we now have Ibis which is by far the best way to use Kedro with pretty much every SQL backend https://kedro.org/blog/building-scalable-data-pipelines-with-kedro-and-ibis
m
Ah thanks a lot! Connection via dummy dataset worked. Ibis also looks interesting, will have a look!