Zubin Roy
07/14/2025, 10:40 AMdf_2:
type: polars.LazyPolarsDataset
filepath: data/01_raw/test.parquet
file_format: parquet
Error
Process finished with exit code 139 (interrupted by signal 11:SIGSEGV)
Juan Luis
07/14/2025, 10:57 AM$ ipython
In [1]: %load_ext kedro
In [2]: df = catalog.load("data/01_raw/test.parquet")
In [3]: df.write_parquet("data/01_raw/test.parquet")
that way you can see if the problem is in Kedro or PolarsZubin Roy
07/14/2025, 11:30 AMif isinstance(df, pl.LazyFrame):
df = df.collect()
df.write_parquet("data/01_raw/test_1.parquet", use_pyarrow=True)
I think the error is coming from the datasets save method (https://docs.kedro.org/projects/kedro-datasets/en/kedro-datasets-6.0.0/_modules/kedro_datasets/polars/lazy_polars_dataset.html#LazyPolarsDataset) in particular this line:
save_method(file=fs_file, **self._save_args)
I'm unsure how to solve the issue but am pretty sure that's what is causing the above error. For my purposes I think a manual save to an output file path will work for me. But am curious if other people have flagged this issue when it comes to saving polars dataframes as parquet files? (As if it's a csv the kedro catalog works fine!)Juan Luis
07/14/2025, 11:52 AMuse_pyarrow
from your df.write_parquet
call?Zubin Roy
07/14/2025, 12:57 PMProcess finished with exit code 139 (interrupted by signal 11:SIGSEGV)
So does that mean it's a Polars issue? Or when we save files using the Kedro Catalog are we using write_parquet without that argument?Elena Khaustova
07/14/2025, 1:35 PMwrite_parquet
is used without use_pyarrow=True
, Polars defaults to its own rust-based parquet backend.Elena Khaustova
07/14/2025, 1:36 PMsave_args
df_2:
type: polars.LazyPolarsDataset
filepath: data/01_raw/test.parquet
file_format: parquet
save_args:
use_pyarrow: true
Zubin Roy
07/14/2025, 2:19 PM