Sid Shetty
08/04/2023, 3:29 PMcpa_llm.blocking_output@partitions:
type: PartitionedDataSet
path: data/cpa_llm/blocking_output
overwrite: True
filename_suffix: ".parquet"
dataset:
type: spark.SparkDataSet
file_format: parquet
save_args:
mode: overwrite
When I read the same data as a spark dataset I get the error that AnalysisException: Unable to infer schema for Parquet. It must be specified manually.
but when I read from one of the particular partitions it infers the schema. Was wondering if there maybe a step I am missing here or if you recommend some other data type over parquet to store the files.
Appreciate any help here 😄datajoely
08/04/2023, 3:30 PMSid Shetty
08/04/2023, 3:33 PM