OmegaConf x Catalog Hi Team I ve read the documentation reg Kedro #questions

[OmegaConf x Catalog] Hi Team, I’ve read the docu...

Mate Scharnitzky

08/09/2023, 4:43 PM

[OmegaConf x Catalog] Hi Team, I’ve read the documentation regarding advanced configuration, very helpful! I’m clear on how we can use templates for catalogs. I’m wondering if there is a way to replace the below anchor as well created for a specific dataset type, e.g.,

spark.SparkDataSet

with csv? Or would you refrain from using these anchors as it makes the config less readable/explicit which against your philosophy?

Copy code

_spark_csv: &spark_csv
  type: spark.SparkDataSet
  file_format: csv
  load_args:
    sep: ","
    header: True
    inferSchema: True

my_spark_csv:
  <<: *spark_csv
  filepath: "${_base_path}/${_folders.raw}/name_of_dataset"

The reason I’m asking is that there is a very little chance I would change config e.g.,

load_args

for a specific dataset type, so pretty much I need • pandas csv, • pandas excel, • pandas parquet, • spark csv, • spark parquet • and pickle so, I’m wondering whether there is a way to define these in one place, and in the catalog I would just the template. What do you think? Thank you in advance!

Ankita Katiyar

08/09/2023, 4:53 PM

Have you tried using the dataset factories feature we introduced recently? https://docs.kedro.org/en/stable/data/data_catalog.html#load-multiple-datasets-with-similar-configuration-using-dataset-factories

👍 1

Mate Scharnitzky

08/09/2023, 4:55 PM

Hi Ankita, Oh no, I haven’t heard of this before, how could I miss it? Thanks, let me look into it.

Mate Scharnitzky

08/09/2023, 4:58 PM

Ok, it’s been released 8 days ago, that’s not a long time 🙂

Ankita Katiyar

08/09/2023, 4:59 PM

Yeah, it was quite recent! Let me know if this solves your problem! 😄

8 Views

Open in Slack

Previous Next