https://kedro.org/ logo
#questions
Title
# questions
a

Afaque Ahmad

11/24/2022, 9:43 AM
Hi Team, I'm working on a use-case wherein I need to make certain values from a
cache
made available inside the
_load
method of multiple Kedro Datasets. How to go about it? Can we use hooks? or anything simpler?
āœ… 1
m

marrrcin

11/24/2022, 9:57 AM
What kind of cache?
d

datajoely

11/24/2022, 10:05 AM
And is CachedDataSet not helpful here?
a

Afaque Ahmad

11/24/2022, 10:08 AM
I've to use SparkDataSet. Its a
redis
cache having metadata updated dynamically based on which data is loaded.
d

datajoely

11/24/2022, 10:09 AM
so hooks + custom dataset feel like the right approach
it sort of goes against the reproducibility centred patterns we want to encourage, but I’m increasingly of the opinion that we need to think about the right way of doing dynamic/conditional pipelines
a

Afaque Ahmad

11/24/2022, 10:13 AM
Gotcha, okay, what are the repercussions around having dynamic conditional pipelines? (the metadata will have history, so the same behaviour could be reproduced)
If we were to implement using hooks, what would be the best place to inject the values?
m

marrrcin

11/24/2022, 10:24 AM
before_dataset_loaded
hook most probably
d

datajoely

11/24/2022, 10:27 AM
or
before_pipeline_run
since you have access to everything
a

Afaque Ahmad

11/24/2022, 10:35 AM
In case I use
before_pipeline_run
, the datasets would be loaded by that time? I need to inject the values before loading the dataset, seems I've to use
before_dataset_loaded
?
before_dataset_loaded
is called each time before loading the dataset or only once before all datasets are loaded?
Is there a way I could fetch the values from the cache only once in Kedro's lifecycle and avoid doing it with every dataset load?
n

Nok Lam Chan

11/24/2022, 11:08 AM
Before pipeline happens before this I believe
Pulling out some slightly outdated docs, wait a min
There will be a more updated version and explanation of the lifecycle coming, I will work on it šŸ˜…šŸ˜…šŸ˜…
ā¤ļø 3
a

Afaque Ahmad

11/24/2022, 11:55 AM
Great. I'll use
before_pipeline_run
. Where can I find all the inputs to the hook
before_pipeline_run
? In the example I see only
run_params
you want to look here not at that example, pluggy will smartly provide only relevant inputs
the
pipeline.inputs
attr will be helpful here
a

Afaque Ahmad

11/24/2022, 11:58 AM
Awesome, thanks so much everyone!!
2 Views