Adding to hugging face recently? I have a PoC here...
# questions
p
Adding to hugging face recently? I have a PoC here https://github.com/everycure-org/matrix/tree/feat/kedro-hf-dataset/pipelines/huggingface-dataset-demo Curious to get feedback on that and happy to bring to the kedro-plugins repo ofc. @Merel just fyi (and sorry for sleeping on our google sheets dataset, I nudged laurens to pick that back up)
m
Hello @Pascal Brokmeier, good to see you here again 👋 We've got some hugging face datasets in
kedro-datasets
added by @Juan Luis a while ago. Don't know if there's any overlap or if this could fit in as an addition 🙂
p
yeah that one reads only and it doesn't use the new file format
the above is meant to read/write + support spark/polars/pandas
👍 1
(we're building this because we're getting read to release our knowledge graph to the public and want to have this well built in our release pipeline)
m
Okay cool in that case it can maybe replace the older ones
p
OK I got it a bit more cleaned up and since we need it for now we're storing it badly named under our GCP datasets https://github.com/everycure-org/matrix/blob/feat/kedro-hf-dataset/libs/matrix-gcp-datasets/src/matrix_gcp_datasets/huggingface.py
I'll see if our academic devs can take this up and contribute it to the general datasets plugins
💛 1
m
That would be great 😄