Hi Kedro team I am thinking of streamlining a part of a proj Kedro #questions

Hi Kedro team! I am thinking of streamlining a par...

George p

03/13/2024, 12:36 AM

Hi Kedro team! I am thinking of streamlining a part of a project, using kedro-datasets and OmegaConfigLoader. However, my raw datasets are mainly time-series, that are updated every day with a new value (in a proprietary storage database which i have to query). I would like to ask if you have any advice on best practices, concerning handling the (natural daily) extension of the time-series data, so as to avoid re-querying the database for dates calculated, but also to maintain reproducibility of the project. (For brevity of this post, i have added more information in the first comment)

George p

03/13/2024, 12:37 AM

For example, i could have a parameter such as "end_date", to run the project on a specific date range, but as soon as the end_date is updated (or set to "today") i would like to be able to read any datapoints already saved (like a cache system) and just extend for the new dates in the date range. Does something like that exist already? Would you perhaps know of any such implementations?

Deepyaman Datta

03/13/2024, 3:31 AM

The concept of

IncrementalDataset

exists, but it's designed for a file-based dataset: https://docs.kedro.org/en/stable/data/partitioned_and_incremental_datasets.html#incremental-datasets That said, I think it could be extensible to handle a time partition from a database. I don't think there's anything built into Kedro (yet) for incremental processing from a database, but I actually could be wrong here, as I haven't kept abreast of developments in this space (and generally have poor memory).

Nok Lam Chan

03/13/2024, 9:23 AM

Hi Kedro team! I am thinking of streamlining a part of a project, using kedro-datasets and OmegaConfigLoader.

Are you using Kedro Project or just

kedro-datasets

and the config loader?

George p

03/13/2024, 8:29 PM

Thank you @Deepyaman Datta. I will look into it. @Nok Lam Chan i havent yet started with the implementation, i am moreso in the "thinking"/mental-modelling phase. What are your thoughts?

3 Views

Open in Slack

Previous Next