Puneet Saini
04/28/2025, 8:20 AMpolars.LazyPolarsDataset
for which I assume the filepath needs to be a glob pattern. But since kedro-datasets>=6.0.0, we are checking the availability of the file itself without expanding the glob pattern if passed in. Is this a bug or am I doing something wrong?Elena Khaustova
04/28/2025, 1:07 PMPuneet Saini
04/28/2025, 1:39 PMfilepath: path/to/my/parquet_folder/*.parquet
, on this line we are checking whether the file exists or not. Since, filepath is a glob pattern and not a static filepath it fails to load the parquet.Elena Khaustova
04/28/2025, 1:45 PMPuneet Saini
04/28/2025, 1:53 PMโญโโโโโโโโโโโโโโโโโโโโโ Traceback (most recent call last) โโโโโโโโโโโโโโโโโโโโโโโฎ
โ /home/circleci/project/project_folder/venv-env/lib/python3.10/site-packages/kedr โ
โ o/io/core.py:245 in load โ
โ โ
โ 242 โ โ โ self._logger.debug("Loading %s", str(self)) โ
โ 243 โ โ โ โ
โ 244 โ โ โ try: โ
โ โฑ 245 โ โ โ โ return load_func(self) โ
โ 246 โ โ โ except DatasetError: โ
โ 247 โ โ โ โ raise โ
โ 248 โ โ โ except Exception as exc: โ
โ โ
โ /home/circleci/project/project_folder/venv-env/lib/python3.10/site-packages/kedr โ
โ o_datasets/polars/lazy_polars_dataset.py:205 in load โ
โ โ
โ 202 โ def load(self) -> pl.LazyFrame: โ
โ 203 โ โ load_path = str(self._get_load_path()) โ
โ 204 โ โ if not self._exists(): โ
โ โฑ 205 โ โ โ raise FileNotFoundError(errno.ENOENT, os.strerror(errno.EN โ
โ 206 โ โ โ
โ 207 โ โ if self._protocol == "file": โ
โ 208 โ โ โ # With local filesystems, we can use Polar's build-in I/O โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
FileNotFoundError: [Errno 2] No such file or directory:
'/home/circleci/project/project_folder/data/pipeline_1/*.parquet'
Elena Khaustova
04/28/2025, 2:46 PMPuneet Saini
04/28/2025, 2:48 PMPuneet Saini
04/28/2025, 2:48 PMElena Khaustova
04/28/2025, 2:50 PMDeepyaman Datta
04/28/2025, 3:01 PMLet me know if checking glob pattern for exists is a good enough solutionI think that's fine. It looks like the CI logs before implementing this fix are gone since it's been a while, but it was just added to fix breaking tests, and I think the reason for this would be because Polars lazyframe doesn't otherwise test file existence until you actually go to collect the result of some operation. You could potentially try removing that and seeing which test fails.
Elena Khaustova
04/28/2025, 3:31 PM