Solomon Yu
02/22/2023, 2:18 PMmy_dataset:
type: pandas.CSVDataSet
filepath: path-to-my-file.csv
load_args:
parse_dates: ['col_3']
dtype: dtypes_dict_var
So that catalog.yml won't be too many lines long.
I'd like this dtype dict to live within conf/base/parameters/my_pipeline.yml, as:
dtypes_dict_var: {
"col_1": int,
"col_2": str,
"col_3": DateTime<'Y-m-d'>, # assumes YAML API syntax will be converted to datetime object
}
Another question here is how to pass in datetime object type to load_args:dtype
I'd like this dtype dict to affect only loading my_dataset, and not use as a global var if possible. A separate case could be that I'd like to load the same dataset with different dtypes in different pipelines, which could utilise TemplatedConfigLoader..
Passing in certain parameters doesn't seem very straightforward tbh :/
Thanks in advance!Nok Lam Chan
02/23/2023, 5:39 AMdtypees_dict_var
will live in the same catalog.yml
though
2. Use TemplatedConfigLoader, this means it will be a global var though.
Can I ask why do you want to do this? It seems like you are moving part of the dataset definition to another file, is that because you only want to overwrite part of the dataset definition? IMO paramters.should
be something that gets passed into nodes, the parameters about dataset
should be in catalog.yml
insteadMerel
02/23/2023, 10:55 AMOmegaConfigLoader
which we released on Tuesday. It allows you to do variable interpolation for parameters with omegaconf. See our docs for more info: https://kedro.readthedocs.io/en/stable/kedro_project_setup/configuration.html#templating-for-parametersSolomon Yu
02/23/2023, 12:16 PMdtypes_dict_var
albeit having to increase memory usage.
Extending from this, there's other types/ formats I'd like to strictly enforce on read, i.e. DateTime<'Y-m-d'> (YAML syntax, but this errors, this relates to my other question of parsing in datetime or other object types such as numpy dtypes into this dtypes_dict_var
). Furthermore, if I'd like to use pyarrow engine in ParquetDataSet for loading, and pass options to the engine, I may also need to pass in pyarrow objects.
(The openpyxl parser problem is best dealt with outside of here, and ideally I wouldn't have to deal with this if I didn't receive a data dump in Excel...)Nok Lam Chan
02/23/2023, 3:33 PMOmegaConfigLoader
resolver in the future, you could potentially create any class/object from the config.
Note: This is just something I hacked together now, you can also create a custom OmegaConfigLoader
to achieve similarlySolomon Yu
02/23/2023, 3:40 PMNok Lam Chan
02/23/2023, 3:40 PMSolomon Yu
02/23/2023, 3:42 PMMerel
02/23/2023, 3:42 PMOmegaConf
and we’d advice people to just create their custom ones themselvesSolomon Yu
02/23/2023, 3:45 PMMerel
02/23/2023, 3:48 PMOmegaConfigLoader
on Tuesday 🙂 I expect in the 0.19.X
series we’ll add a lot more features to it.Nok Lam Chan
02/23/2023, 4:11 PM