I have one more question for you guys. I have a pi...
# questions
s
I have one more question for you guys. I have a pipeline,
pipeline1
, that uses a dataset
x
as data input. This dataset is a custom dataset class that downloads a set of data from a REST-api we have. Multiple nodes use
x
as input. I want to make a test pipeline that wraps
pipeline1
by loading a different dataset (still from a REST-api, but with different query parameters) together with additional test nodes that runs performance metrics on the results from
pipeline1
. I have implemented this by using the override functionality of pipeline by wrapping
pipeline1
in a new pipeline function and giving it a override dictionary to use the test dataset instead of the original dataset,
inputs={x: test_x}
. This seems to work, but I register that it downloads the data multiple times, which is not preferable since it takes some time to download the dataset from the api each time. It seems like each node that uses
x
in
pipeline1
each downloads(loads) the dataset instead of it being loaded one time for the whole test pipeline. Do know how to prevent the dataset from being loaded for each node? (code in the comments)
Copy code
def create_pipeline(**kwargs) -> Pipeline:

    # cross_section_pipeline = create_cross_section_pipeline()

    cross_section_pipeline = pipeline(
        pipe = create_cross_section_pipeline(),
        inputs={"radar_data": "falcon_test_data"},
    )

    cross_section_plotting = node(
        func=cross_section_visualizer,
        inputs=["concatenated_result", "falcon_test_data"],
        outputs="cross_section_plot",
    )

    reporting_pipeline = pipeline([cross_section_plotting])

    return cross_section_pipeline + reporting_pipeline
d
This wrapper will work within the lifecycle of a run https://kedro.readthedocs.io/en/stable/kedro.io.CachedDataSet.html But it will not cache between runs
s
Hm, yeah, that make sense. But it seems to load the
falcon_test_data
for each node that uses
radar_data
within the
cross_section_pipeline
. Hmmm.... I'll look at you example! Thanks
Ah, I have forgotten to save it as a
MemoryDataSet
But the example I have given over, when ran once, it should be able to cache
radar_data
for all nodes?
d
which example? The built in or custom one
s
Ah, figured it out! I thought that kedro automatically cached the output of
_load
from my custom dataset, but figured out that I had to take care of this caching myself. Just stored the data in a class variable and it sorted things out! Thanks for the help 😊
👍 1
d
amazing!