hi team just a quick question let s say I have output O1 fro Kedro #questions

hi team, just a quick question. let's say I have o...

Dustin

02/08/2023, 12:58 AM

hi team, just a quick question. let's say I have output O1 from node1 with the associated catalog configured so the content of O1 will be saved to CSV. node2 will use O1 as input. The current behaviour is that node2 will reload the data from O1 file instead of from memory (this is expected I assume, due to the catalog configuration). Is there any way I could still have O1 saved as CSV (easier for business people to check data quality) while having O1 loaded to node2 through memory (faster and no need to deal with csv save/load tricks), Thanks

ed johnson

02/08/2023, 2:20 AM

You could define two output datasets from node1. One of the two datasets would be saved as a CSV (i.e. there is a catalog entry for CSV), while the other is simply referenced by node2 (i.e. no catalog entry, and therefore it's an "in-memory" dataset). Let me know if that makes sense.

👍 1

Deepyaman Datta

02/08/2023, 2:23 AM

Yes!! I haven't maintained it due to (perceived) lack of interest, but https://github.com/deepyaman/kedro-accelerator solves exactly this problem. 🙂

👍 1

Deepyaman Datta

02/08/2023, 2:25 AM

(I'm guessing it would require minimal changes to work with 0.18.x, but nobody's asked; if this solves your problem and you'd use it, I'm happy to try and find some time to update it, or of course happy to accept PRs)

Dustin

02/08/2023, 3:20 AM

Cool, thanks all, I will start with option 1 (logically, it should work without issue) and move to a more programmatic solution (accelerator) if get extra time.

Dustin

02/08/2023, 3:22 AM

Good to know the accelerator was built so I know I was not asking for a meaningless use case 🙂

Massinissa Saïdi

02/08/2023, 7:35 AM

Hello, can this solution (https://kedro.readthedocs.io/en/stable/data/data_catalog.html#transcode-datasets) be interresting? With df@csv with pandas.CSVDataSet and df@memory with MemoryDataset?

Dustin

02/08/2023, 8:43 AM

Thanks Massinissa, good point. I think this is similar to what johnson mentioned above, fundamentally you need to have two outputs

Open in Slack

Previous Next