Hello team, when I split a pandas dataframe and st...
# questions
s
Hello team, when I split a pandas dataframe and store using partitioned dataset, loading them back together appears to find schema differences. Since a few columns have
nulls
. Is there any workaround here that avoids me having to add another node to put these partitions together and ideally just read as a pandas.ParquetDataSet? Perhaps passing the schema of the original dataframe or even specifying it explicitly?
j
@Sid Shetty you can add
load_args
to your dataset to control how to
pd.read_parquet
will be used, these will get passed directly
s
Ahhh thank you,
use_pandas_metadata
seems like just what I was looking for!
🥳 1