Miguel Rodríguez
09/14/2024, 8:27 PMNok Lam Chan
09/15/2024, 12:00 AMMiguel Rodríguez
09/15/2024, 12:05 AM"{country}.prm_my_dataset":
type: spark.SparkDataset
file_format: delta
credentials: ${globals:datalake_credential}
save_args:
mode: overwrite
mergeSchema: true
partitionOverwriteMode: dynamic
partitionBy: ["date_column"]
filepath: ${globals:versioned_storage_path}${globals:namespace}/data/{country}/03_primary/prm_my_dataset
metadata:
kedro-viz:
layer: primary
Most of my datasets look very similar, just some layer, partitioning and filepaths changes.
My catalog ends up being quite unreadable with 13 lines per catalog entry as I have around a hundred of them.
And if I want to change something at project level (e.g. add some new metadata to all datasets) I end up having to do a lot of find-replace and end up with an even more complex catalogMerel
09/17/2024, 2:23 PM