Sebastian Cardona Lozano
03/11/2023, 12:26 AMVersionNotFoundError: Did not find any versions for SparkDataSet(file_format=parquet,
filepath=<gs://bdb-gcp-cds-pr-ac-ba-analitica-avanzada/banca-masiva/599_profundizacion/data/05_model_input/master_model_input.pa>
rquet, load_args={'header': True, 'inferSchema': True}, save_args={}, version=Version(load=None,
save='2023-03-10T23.44.07.085Z'))
In the catalog.yml I have this:
master_model_input:
type: spark.SparkDataSet
filepath: <gs://bdb-gcp-cds-pr-ac-ba-analitica-avanzada/banca-masiva/599_profundizacion/data/05_model_input/master_model_input.parquet> #<gs://uri> de cloud storage
file_format: parquet
layer: model_input
versioned: True
load_args:
header: True
inferSchema: True
However, the parquet file is generated correctly in GCS (see the image attached).
Thanks for your help! 🙂Jannic Holzer
03/11/2023, 12:37 AMSparkDataSet
resolves file paths. It does not use fsspec
in the same way that other datasets do, which leads to these difficulties. We have it on our radar to fix this, I'll make sure a fix gets prioritised.Sebastian Cardona Lozano
03/11/2023, 1:02 AM