Kedro is an open-sourced Python framework for creating maintainable and modular data science code.

Kedro

Hey team, I am saving partitioned dataset with pyspark parquet data types, catalog entry:
```cpa_llm.blocking_output@partitions:
  type: PartitionedDataSet
  path: data/cpa_llm/blocking_output
  overwrite: True
  filename_suffix: ".parquet"
  dataset:
    type: spark.SparkDataSet
    file_format: parquet
    save_args:
      mode: overwrite```
When I read the same data as a spark dataset I get the error that `AnalysisException: Unable to infer schema for Parquet. It must be specified manually.`  but when I read from one of the particular partitions it infers the schema. Was wondering if there maybe a step I am missing here or if you recommend some other data type over parquet to store the files.
Appreciate any help here :smile:

you can’t partition Spark unfortunately

since it’s doing something similar under the hood anyway