https://kedro.org/ logo
#questions
Title
# questions
e

Elias

01/31/2023, 5:54 PM
What would be the smartest way to query only data from a database that is newer than 5 years (from today/a set enddate) through the catalog?
d

datajoely

01/31/2023, 5:56 PM
SQLQueryDataSet? That being said we do emphasise this sort of piece does limit the reproducibility of your pipelines if the data underneath is changing
e

Elias

01/31/2023, 5:59 PM
yes pandas.SQLQueryDataSet. Well the requirement is to save memory and we assume that a limited amount of data (e.g. 5 years) would be enough for model training. The database itself however fills itself with historical data and thus growths.
however we obviously want to take the most recent 5 years…
d

datajoely

01/31/2023, 6:01 PM
so it’s then a case of doing that condition with sql in your relevant dialect:
Copy code
SELECT *
FROM xxxx
WHERE date > dateadd('years', -5, today())
some variant of that
e

Elias

01/31/2023, 6:02 PM
yeah, that makes sense. Is there a way to add a specific end date instead of today?
2 Views