Mate Scharnitzky
08/09/2023, 4:43 PMspark.SparkDataSet
with csv? Or would you refrain from using these anchors as it makes the config less readable/explicit which against your philosophy?
_spark_csv: &spark_csv
type: spark.SparkDataSet
file_format: csv
load_args:
sep: ","
header: True
inferSchema: True
my_spark_csv:
<<: *spark_csv
filepath: "${_base_path}/${_folders.raw}/name_of_dataset"
The reason I’m asking is that there is a very little chance I would change config e.g., load_args
for a specific dataset type, so pretty much I need
• pandas csv,
• pandas excel,
• pandas parquet,
• spark csv,
• spark parquet
• and pickle
so, I’m wondering whether there is a way to define these in one place, and in the catalog I would just the template. What do you think?
Thank you in advance!Ankita Katiyar
08/09/2023, 4:53 PMMate Scharnitzky
08/09/2023, 4:55 PMAnkita Katiyar
08/09/2023, 4:59 PM