Hiya is <https github com deepyaman kedro accelerator> still Kedro #plugins-integrations

Hiya, is <https://github.com/deepyaman/kedro-accel...

Iñigo Hidalgo

02/12/2024, 5:14 PM

Hiya, is https://github.com/deepyaman/kedro-accelerator still supported? @Deepyaman Datta I assume not considering its age, but asking can't hurt. If it is no longer functional, do you have any recommendations from modern kedro functionality to replicate it? Is this the usecase that cached dataset solves? https://docs.kedro.org/en/stable/_modules/kedro/io/cached_dataset.html EDIT: answered myself: yes, cacheddataset is the kedro-native implementation

Nok Lam Chan

02/12/2024, 5:20 PM

I think they are solving similar problems, however if you are saving the data I think the "saving" part is still a blocking operation, while

load

is directly fetch from memory.

👍 1

Nok Lam Chan

02/12/2024, 5:22 PM

It's also worth to mention, if you are not using "type safe" dataset, there could be differences. For example "save" and "load" a CSVdataset doesn't guarantee you the same result because of automatic casting. If you are converting to Cachedataset, I would recommend to check if your output are identical after that.

Iñigo Hidalgo

02/12/2024, 5:22 PM

Okay, for me the feature I was looking at was the loading directly from memory instead of loading from disk, so Cache works for me

Iñigo Hidalgo

02/12/2024, 5:22 PM

Also good point about the second, thank you!

Iñigo Hidalgo

02/12/2024, 5:23 PM

Btw @Nok Lam Chan I know you mentioned you've been tweaking the search result weighting on the page: if there is some documentation about the cached dataset I did not find it

👀 1

Iñigo Hidalgo

02/12/2024, 5:23 PM

(other than the entries under the API documentation, I meant some documentation about it in the "main" docu sections)

Nok Lam Chan

02/12/2024, 5:24 PM

https://noklam.github.io/blog/posts/2021-07-02-kedro-datacatalog.html There are still a bug about Cachedataset that I haven't been able to fix.

Nok Lam Chan

02/12/2024, 5:25 PM

I don't recall any specific documentation other than API docs. IMO it's a useful feature particular for interactive workflow, but there haven't been a lot of development since it is implemented a few years ago.

Nok Lam Chan

02/12/2024, 5:25 PM

feel free to create more github issues :D

Iñigo Hidalgo

02/12/2024, 5:42 PM

Ok, I posted a docs issue just for reference https://github.com/kedro-org/kedro/issues/3616 For us it's covering a very narrow usecase (a badly designed pipeline tbh) so I'm not surprised it's not more widely-used

thankyou 1

👍🏼 1

Nok Lam Chan

02/12/2024, 5:45 PM

I think the default should be as efficient as possible, as I understand there were issues to make this generic enough. It's also a bit closer to the "orchestrator" mode where nodes doesn't communicate through shared memory, arguably not necessary for local development.

Deepyaman Datta

02/12/2024, 8:19 PM

@Nok Lam Chan's top answer is the big difference; if you are writing a lot of datasets, the idea is there's no need to block on that. As for maintenance, I maintained it for a while, but there were no users. 🤷 I'm still not sure why. I'm open to updating it if you're going to use it; also, you're Kedro expert enough that you could probably update the hook pretty easily. :)

😂 2

12 Views

Open in Slack

Previous Next