Deepyaman Datta
01/22/2024, 5:03 PMcopy_mode for MemoryDataset currently? If not, would it make sense to support this, or would it make sense to replace the default dataset?Iñigo Hidalgo
01/22/2024, 5:08 PMIñigo Hidalgo
01/22/2024, 5:09 PMDeepyaman Datta
01/22/2024, 5:15 PMDeepyaman Datta
01/22/2024, 5:16 PMIñigo Hidalgo
01/22/2024, 5:17 PMIñigo Hidalgo
01/22/2024, 5:18 PM_infer_copy_mode method struck me as quite an odd implementation too, with a specific handling case for pandas and numpy, and then always just defaulting to deepcopy. It does seem like something that should be user configurable if wantedIñigo Hidalgo
01/22/2024, 5:19 PMIñigo Hidalgo
01/22/2024, 5:41 PMNok Lam Chan
01/22/2024, 5:46 PMcopy_mode on catalog.yml?Iñigo Hidalgo
01/22/2024, 5:46 PMNok Lam Chan
01/22/2024, 5:47 PM# catalog.yml
{_catch_all_}:
type: MemoryDataset
copy_mode: "infer"Nok Lam Chan
01/22/2024, 5:47 PMNok Lam Chan
01/22/2024, 5:48 PMDeepyaman Datta
01/22/2024, 5:48 PMcopy_mode: "assign".Deepyaman Datta
01/22/2024, 5:49 PM_infer_copy_mode or make it more flexible.Iñigo Hidalgo
01/22/2024, 5:49 PMIñigo Hidalgo
01/22/2024, 5:49 PMNok Lam Chan
01/22/2024, 5:50 PMDeepyaman Datta
01/22/2024, 5:51 PMDataFrame type incorrectlyDeepyaman Datta
01/22/2024, 5:51 PMIñigo Hidalgo
01/22/2024, 5:51 PMSubclassing is still a bit uglyhow so?
Deepyaman Datta
01/22/2024, 5:52 PMDeepyaman Datta
01/22/2024, 5:53 PM> Subclassing is still a bit ugly
how so?Sorry, what I meant is, subclassing doesn't make a bad solution, but I think the fact that you need to create a dataset subclass of a dataset a lot of users aren't necessarily actively thinking about, feels non-ideal. Maybe it's fine.
Deepyaman Datta
01/22/2024, 5:54 PMsome sort of mapping of types to default dataset type passed in settings.py or something along those linesSomething like this, or rather a mapping of default copy mode for datatype, could be nice IMO 🙂
Iñigo Hidalgo
01/22/2024, 5:54 PMNok Lam Chan
01/22/2024, 5:56 PMcopy_mode configuration exists. I guess my question is why do you need to change this default?Iñigo Hidalgo
01/22/2024, 5:57 PMNok Lam Chan
01/22/2024, 5:57 PMcopy_mode? I think the Memorydataset default was trying to do that, though it hasn't been updated for years.Deepyaman Datta
01/22/2024, 5:58 PMassignNok Lam Chan
01/22/2024, 6:01 PMNok Lam Chan
01/22/2024, 6:02 PMIñigo Hidalgo
01/22/2024, 6:06 PM__init__ kwarg, and it's more depending on the type of the data being saved *into the dataset at _save time than the type of the dataset itselfNok Lam Chan
01/22/2024, 6:12 PMdef _infer_copy_mode(data: Any) -> str:
if pd and isinstance(data, pd.DataFrame) or np and isinstance(data, np.ndarray):
copy_mode = "copy"
elif type(data).__name__ == "DataFrame":
copy_mode = "assign"
else:
copy_mode = "deepcopy"
return copy_mode
Basically you want this but expose to settings.py so it's customisable?Nok Lam Chan
01/22/2024, 6:14 PMassign mode, I don't see why anyone would want it differently. If so we should just have it embedded in framework rather than creating a new configuration in settings.pyIñigo Hidalgo
01/22/2024, 6:53 PMIñigo Hidalgo
01/22/2024, 7:00 PMevery project is gonna have the same settingsthat is true, but that puts the maintenance and testing burden on kedro to support compatibility with packages it shouldn't even know about. is there an auto import of settings functionality like (i think) there is with hooks?
Nok Lam Chan
01/22/2024, 11:56 PM<http://kedro.io|kedro.io> DataCatalog API.Nok Lam Chan
01/23/2024, 6:36 PMDeepyaman Datta
01/25/2024, 3:32 PMMemoryDataset without changing copy mode in my project right now, but this is using DuckDB backend. Probably, some other tables encapsulate more state (will need to verify).Iñigo Hidalgo
01/25/2024, 3:49 PMlooks like thehttps://github.com/kedro-org/kedro/issues/2374#issuecomment-1593132824 Not sure if that's the issue but i would expect it depends on the connection methodproperty is not part of.con, but later created on the fly https://github.com/ibis-project/ibis/blob/6d4a34f74bf4880d33495ad94d44d58fdd97c2b0/ibis/backends/base/sql/alchemy/__init__.py#L122__init__