lets say i have the following:
- source: a csv rest api with time series data and a 'duration to pull' parameter
- task: weekly preparation of a dataset (historic and recent data) to be used by a BI tool for vizualization
what would be the kedroic way to implement this?
define a 'first run/update run' parameter in the conf/parameters.yml.
if first run, pull all the data there is (duration to pull in last weeks = nan) and save as partitioned dataset into 01_raw (yearweek as partition key).
if update run, determine amount of weeks to pull by checking whats already downloaded (difference between begin(='most recent' yearweek foldername in the partitioned dataset) and end(=current yearweek)) and save in same partitioned dataset (in fact i guess it would happen inside the same node as 'first run', the only difference being the computed 'duration to pull' parameter).
in another node, the report dataset would be prepared (concat all data, save as multi sheet xlsx) and saved into 08_reporting.
any advice is appreciated!