Hi Kedro team! I am thinking of streamlining a par...
# questions
g
Hi Kedro team! I am thinking of streamlining a part of a project, using kedro-datasets and OmegaConfigLoader. However, my raw datasets are mainly time-series, that are updated every day with a new value (in a proprietary storage database which i have to query). I would like to ask if you have any advice on best practices, concerning handling the (natural daily) extension of the time-series data, so as to avoid re-querying the database for dates calculated, but also to maintain reproducibility of the project. (For brevity of this post, i have added more information in the first comment)
For example, i could have a parameter such as "end_date", to run the project on a specific date range, but as soon as the end_date is updated (or set to "today") i would like to be able to read any datapoints already saved (like a cache system) and just extend for the new dates in the date range. Does something like that exist already? Would you perhaps know of any such implementations?
d
The concept of
IncrementalDataset
exists, but it's designed for a file-based dataset: https://docs.kedro.org/en/stable/data/partitioned_and_incremental_datasets.html#incremental-datasets That said, I think it could be extensible to handle a time partition from a database. I don't think there's anything built into Kedro (yet) for incremental processing from a database, but I actually could be wrong here, as I haven't kept abreast of developments in this space (and generally have poor memory).
n
Hi Kedro team! I am thinking of streamlining a part of a project, using kedro-datasets and OmegaConfigLoader.
Are you using Kedro Project or just
kedro-datasets
and the config loader?
g
Thank you @Deepyaman Datta. I will look into it. @Nok Lam Chan i havent yet started with the implementation, i am moreso in the "thinking"/mental-modelling phase. What are your thoughts?