Sabrina Zuraimi
08/22/2024, 2:33 AMSabrina Zuraimi
08/22/2024, 2:41 AM__pandas_parquet: &pandas_parquet
type: pandas.ParquetDataset
save_args:
index : False
Sabrina Zuraimi
08/22/2024, 2:42 AMp05_model_input.master_table@pandas:
<<: *pandas_parquet
filepath: /data/_model_input/master_table.parquet
Sabrina Zuraimi
08/22/2024, 2:43 AMcatalog.load("p05_model_input.master_table@pandas")
returns no file errorSabrina Zuraimi
08/22/2024, 2:43 AMcatalog.load("p05_model_input.master_table@spark")
worksSabrina Zuraimi
08/22/2024, 2:58 AM_spark: &spark
type: spark.SparkDatase
file_format: parquet
save_args:
mode: overwrite
Dmitry Sorokin
08/22/2024, 9:01 AMSabrina Zuraimi
08/22/2024, 9:19 AMSabrina Zuraimi
08/22/2024, 9:19 AMDmitry Sorokin
08/22/2024, 9:39 AMpyarrow
by default, but you can try explicitly adding it to the load_args
like this:
load_args:
engine: pyarrow
Also, make try to specify hdfs
in the filepath, like so: hdfs:///data/_model...
.
By anchoring, I mean the &pandas_parquet
syntax in your configuration.Nok Lam Chan
08/23/2024, 4:18 PMpandas
in this case. If possible, can you try to read the file from pure pandas
without Kedro first?