Hi Team, I'm working on a use-case wherein I need ...
# questions
a
Hi Team, I'm working on a use-case wherein I need to make certain values from a
cache
made available inside the
_load
method of multiple Kedro Datasets. How to go about it? Can we use hooks? or anything simpler?
āœ… 1
m
What kind of cache?
d
And is CachedDataSet not helpful here?
a
I've to use SparkDataSet. Its a
redis
cache having metadata updated dynamically based on which data is loaded.
d
so hooks + custom dataset feel like the right approach
it sort of goes against the reproducibility centred patterns we want to encourage, but Iā€™m increasingly of the opinion that we need to think about the right way of doing dynamic/conditional pipelines
a
Gotcha, okay, what are the repercussions around having dynamic conditional pipelines? (the metadata will have history, so the same behaviour could be reproduced)
If we were to implement using hooks, what would be the best place to inject the values?
m
before_dataset_loaded
hook most probably
d
or
before_pipeline_run
since you have access to everything
a
In case I use
before_pipeline_run
, the datasets would be loaded by that time? I need to inject the values before loading the dataset, seems I've to use
before_dataset_loaded
?
before_dataset_loaded
is called each time before loading the dataset or only once before all datasets are loaded?
Is there a way I could fetch the values from the cache only once in Kedro's lifecycle and avoid doing it with every dataset load?
n
Before pipeline happens before this I believe
Pulling out some slightly outdated docs, wait a min
There will be a more updated version and explanation of the lifecycle coming, I will work on it šŸ˜…šŸ˜…šŸ˜…
ā¤ļø 3
a
Great. I'll use
before_pipeline_run
. Where can I find all the inputs to the hook
before_pipeline_run
? In the example I see only
run_params
you want to look here not at that example, pluggy will smartly provide only relevant inputs
the
pipeline.inputs
attr will be helpful here
a
Awesome, thanks so much everyone!!