Kedro is an open-sourced Python framework for creating maintainable and modular data science code.

Kedro

image.png

Hello team, when I split a pandas dataframe and store using partitioned dataset, loading them back together appears to find schema differences. Since a few columns have `nulls` . Is there any workaround here that avoids me having to add another node to put these partitions together and ideally just read as a pandas.ParquetDataSet? Perhaps passing the schema of the original dataframe or even specifying it explicitly?

<@U04HEMKJZDH> you can add `load_args` to your dataset to control how to `pd.read_parquet` will be used, these will get passed directly

see some docs at <https://docs.kedro.org/en/stable/kedro_datasets.pandas.ParquetDataSet.html#kedro_datasets.pandas.ParquetDataSet.__init__>

Ahhh thank you, *`use_pandas_metadata`*  seems like just what I was looking for!