Hello everyone i am facing a bit of challenge while using ke Kedro #questions

Hello everyone.. i am facing a bit of challenge wh...

Aayush

08/24/2023, 3:31 AM

Hello everyone.. i am facing a bit of challenge while using kedro.. i don't know whether there is an obvious solution that i am missing or not. I want to save a file to s3 bucket and that file is used by next node. But i don't want the next node to read from the s3 bucket rather use the output from the last node directly. can i achieve this while using data catalog

Deepyaman Datta

08/24/2023, 5:06 AM

You need to essentially create two outputs--one which writes to the dataset pointing to an S3 buckets, and another to a

MemoryDataset

. The

MemoryDataset

should then be consumed by the next node. There are also more sophisticated ways to do this; e.g. https://github.com/deepyaman/kedro-accelerator (for inspiration, not currently maintained/not compatible out of the box with Kedro 0.18.x AFAIK).

Aayush

08/24/2023, 6:29 AM

Thank you for the response.. i did exactly that but the challenge is if i want to run the pipeline from the next node then i cannot... or can i?

Lodewic van Twillert

08/24/2023, 7:48 AM

Is this not what the

CachedDataSet

does? 🤔 Haven't used it myself, but this seems like the right usecase https://docs.kedro.org/en/stable/kedro.io.CachedDataset.html

🏆 3

Deepyaman Datta

08/24/2023, 1:56 PM

Ah, nice, I kinda forgot how it worked. Yes,

CachedDataset

is the way to go for this specifically. Thanks @Lodewic van Twillert!!

👍 2

2 Views

Open in Slack

Previous Next