Kedro is an open-sourced Python framework for creating maintainable and modular data science code.

Kedro

Hi Team,
I'm working on a use-case wherein I need to make certain values from a `cache` made available inside the `_load` method of multiple Kedro Datasets. How to go about it? Can we use hooks? or anything simpler?

I've to use SparkDataSet. Its a `redis` cache having metadata updated dynamically based on which data is loaded.

so hooks + custom dataset feel like the right approach

it sort of goes against the reproducibility centred patterns we want to encourage, but I’m increasingly of the opinion that we need to think about the right way of doing dynamic/conditional pipelines

Gotcha, okay, what are the repercussions around having dynamic conditional pipelines? (the metadata will have history, so the same behaviour could be reproduced)

If we were to implement using hooks, what would be the best place to inject the values?

`before_dataset_loaded` hook most probably

or `before_pipeline_run` since you have access to everything

In case I use `before_pipeline_run` , the datasets would be loaded by that time? I need to inject the values before loading the dataset, seems I've to use `before_dataset_loaded`?

~`before_dataset_loaded` is called each time before loading the dataset or only once before all datasets are loaded?~

Is there a way I could fetch the values from the cache only once in Kedro's lifecycle and avoid doing it with every dataset load?

Before pipeline happens before this I believe

Pulling out some slightly outdated docs, wait a min

b9f6b92d9223e4076824a4880241882c.png

There will be a more updated version and explanation of the lifecycle coming, I will work on it :sweat_smile::sweat_smile::sweat_smile:

Screenshot 2022-11-24 at 5.24.41 PM.png

Great. I'll use `before_pipeline_run`. Where can I find all the inputs to the hook `before_pipeline_run` ? In the example I see only `run_params`

<https://kedro.readthedocs.io/en/latest/kedro.framework.hooks.specs.PipelineSpecs.html#kedro.framework.hooks.specs.PipelineSpecs.before_pipeline_run>

you want to look here not at that example, pluggy will smartly provide only relevant inputs

the `pipeline.inputs` attr will be helpful here