https://kedro.org/ logo
#questions
Title
# questions
e

Emilio Gagliardi

09/14/2023, 8:10 PM
Good afternoon kedro peeps 🙂 What is the kedro preferred way to pass around more complex objects like a MongoDB client or a Llama Index vector Index? I haven't learned how to use the advanced config loaders yet so I don't know if its possible to use those advanced techniques alongside the basic loader. In my current project, I'm maintaining a chroma vector db (adding/updating documents and embeddings) wrapped by Llama Index. I tried passing the index like a dataset but that generated a parquet error. I did see someone use the load_obj() utility but I'm not clear how that works for something like a vector store from Llama Index which takes a whole bunch of objects and parameters to instantiate. for example, in order to create a Llama Index vector index, you need to pass two other objects that in turn take 4 or 5 parameters each to configure and that includes a chromadb client. thanks kindly for any recommendations or wisdom!
m

marrrcin

09/15/2023, 8:36 AM
Where are you executing Kedro? If you’re just using
kedro run
on a single machine you could pass around any python object. The quirk is that by default, the
MemoryDataSet
uses copy/deep copy mode and assing for DataFrames, whereas you should probably explicitly use
MemoryDataSet(copy_mode="assign")
instead. You can do this via catalog:
catalog.yml
Copy code
my_llama_index:
    type: MemoryDataSet
    copy_mode: assign
i

Iñigo Hidalgo

09/15/2023, 12:11 PM
Could the new dataset factory functionality somehow help to make copy_mode be assign by default for a certain
type
? I had a hiccup with that default copy mode for memorydatasets with my Ibis dataset.
e

Emilio Gagliardi

09/15/2023, 8:45 PM
thank you @marrrcin that did the trick.
m

marrrcin

09/18/2023, 7:04 AM
@Iñigo Hidalgo with dataset factories, when you do:
Copy code
"{dataset_name}":
    type: MemoryDataSet
    copy_mode: assign
it overrides the “default” dataset, because it’s effectively a “catch-all” dataset
❤️ 1
i

Iñigo Hidalgo

09/18/2023, 2:11 PM
Thanks, can't wait to start using the latest features once we migrate to 0.18+...