Deepyaman Datta
01/22/2024, 5:03 PMcopy_mode
for MemoryDataset
currently? If not, would it make sense to support this, or would it make sense to replace the default dataset?Iñigo Hidalgo
01/22/2024, 5:08 PMIñigo Hidalgo
01/22/2024, 5:09 PMDeepyaman Datta
01/22/2024, 5:15 PMDeepyaman Datta
01/22/2024, 5:16 PMIñigo Hidalgo
01/22/2024, 5:17 PMIñigo Hidalgo
01/22/2024, 5:18 PM_infer_copy_mode
method struck me as quite an odd implementation too, with a specific handling case for pandas and numpy, and then always just defaulting to deepcopy. It does seem like something that should be user configurable if wantedIñigo Hidalgo
01/22/2024, 5:19 PMIñigo Hidalgo
01/22/2024, 5:41 PMNok Lam Chan
01/22/2024, 5:46 PMcopy_mode
on catalog.yml?Iñigo Hidalgo
01/22/2024, 5:46 PMNok Lam Chan
01/22/2024, 5:47 PM# catalog.yml
{_catch_all_}:
type: MemoryDataset
copy_mode: "infer"
Nok Lam Chan
01/22/2024, 5:47 PMNok Lam Chan
01/22/2024, 5:48 PMDeepyaman Datta
01/22/2024, 5:48 PMcopy_mode: "assign"
.Deepyaman Datta
01/22/2024, 5:49 PM_infer_copy_mode
or make it more flexible.Iñigo Hidalgo
01/22/2024, 5:49 PMIñigo Hidalgo
01/22/2024, 5:49 PMNok Lam Chan
01/22/2024, 5:50 PMDeepyaman Datta
01/22/2024, 5:51 PMDataFrame
type incorrectlyDeepyaman Datta
01/22/2024, 5:51 PMIñigo Hidalgo
01/22/2024, 5:51 PMSubclassing is still a bit uglyhow so?
Deepyaman Datta
01/22/2024, 5:52 PMDeepyaman Datta
01/22/2024, 5:53 PM> Subclassing is still a bit ugly
how so?Sorry, what I meant is, subclassing doesn't make a bad solution, but I think the fact that you need to create a dataset subclass of a dataset a lot of users aren't necessarily actively thinking about, feels non-ideal. Maybe it's fine.
Deepyaman Datta
01/22/2024, 5:54 PMsome sort of mapping of types to default dataset type passed in settings.py or something along those linesSomething like this, or rather a mapping of default copy mode for datatype, could be nice IMO 🙂
Iñigo Hidalgo
01/22/2024, 5:54 PMNok Lam Chan
01/22/2024, 5:56 PMcopy_mode
configuration exists. I guess my question is why do you need to change this default?Iñigo Hidalgo
01/22/2024, 5:57 PMNok Lam Chan
01/22/2024, 5:57 PMcopy_mode
? I think the Memorydataset default was trying to do that, though it hasn't been updated for years.Deepyaman Datta
01/22/2024, 5:58 PMassign
Nok Lam Chan
01/22/2024, 6:01 PMNok Lam Chan
01/22/2024, 6:02 PMIñigo Hidalgo
01/22/2024, 6:06 PM__init__
kwarg, and it's more depending on the type of the data being saved *into the dataset at _save
time than the type of the dataset itselfNok Lam Chan
01/22/2024, 6:12 PMdef _infer_copy_mode(data: Any) -> str:
if pd and isinstance(data, pd.DataFrame) or np and isinstance(data, np.ndarray):
copy_mode = "copy"
elif type(data).__name__ == "DataFrame":
copy_mode = "assign"
else:
copy_mode = "deepcopy"
return copy_mode
Basically you want this but expose to settings.py
so it's customisable?Nok Lam Chan
01/22/2024, 6:14 PMassign
mode, I don't see why anyone would want it differently. If so we should just have it embedded in framework rather than creating a new configuration in settings.py
Iñigo Hidalgo
01/22/2024, 6:53 PMIñigo Hidalgo
01/22/2024, 7:00 PMevery project is gonna have the same settingsthat is true, but that puts the maintenance and testing burden on kedro to support compatibility with packages it shouldn't even know about. is there an auto import of settings functionality like (i think) there is with hooks?
Nok Lam Chan
01/22/2024, 11:56 PM<http://kedro.io|kedro.io>
DataCatalog API.Nok Lam Chan
01/23/2024, 6:36 PMDeepyaman Datta
01/25/2024, 3:32 PMMemoryDataset
without changing copy mode in my project right now, but this is using DuckDB backend. Probably, some other tables encapsulate more state (will need to verify).Iñigo Hidalgo
01/25/2024, 3:49 PMlooks like thehttps://github.com/kedro-org/kedro/issues/2374#issuecomment-1593132824 Not sure if that's the issue but i would expect it depends on the connection methodproperty is not part of.con
, but later created on the fly https://github.com/ibis-project/ibis/blob/6d4a34f74bf4880d33495ad94d44d58fdd97c2b0/ibis/backends/base/sql/alchemy/__init__.py#L122__init__