Hi everyone, I am currently using kedro and I made...
# questions
t
Hi everyone, I am currently using kedro and I made a new class for a pandas df that has another pandas df inside a self.props to save some metadata. I was able to manipulate it but now I want to save it as a partitioned dataset using pickle but I am getting an error: This is my custom data set:
Copy code
class PyMeasureDataSet(AbstractDataSet):
    def __init__(self, filepath: str):
        super().__init__()
        self.filepath = filepath

    def _load(self) -> PyMeasureDataFrame:
        try:
            print(self.filepath)
            props, data = read_pymeasure(self.filepath)
            props = parse_props(props)
            return create_custom_dataframe(data, props)

        except:
            breakpoint()
            with open(self.filepath, 'rb') as f:
                return pickle.load(f)

    def _save(self, data: PyMeasureDataFrame) -> None:
        with open(self.filepath, 'wb') as f:
            pickle.dump(data, f)

    def _describe(self) -> Dict[str, Any]:
        return {
            "type": "PyMeasure Data Frame",
            "filepath": self.filepath,
        }
and I am getting this error when trying to save:
FileNotFoundError: [Errno 2] No such file or directory: '/home/tomasrojasc/Documents/thesis/thesis/data/02_intermediate/pc1/2023-07-26/IVg2023-07-26_1.pk'
Also, this is my catalog:
Copy code
# Here you can define all your data sets by using simple YAML syntax.
#
# Documentation for this file format can be found in "The Data Catalog"
# Link: <https://docs.kedro.org/en/stable/data/data_catalog.html>


pc1:
  type: "PartitionedDataSet"
  path: "data/01_raw/pc1"
  dataset: "thesis.extras.datasets.pymeasure_data_set.PyMeasureDataSet"
  filename_suffix: '.csv'
  load_args:
    maxdepth: 2


pc2:
  type: "PartitionedDataSet"
  path: "data/01_raw/pc2"
  dataset: "thesis.extras.datasets.pymeasure_data_set.PyMeasureDataSet"
  filename_suffix: '.csv'
  load_args:
    maxdepth: 2



pc1_dp:
  type: "PartitionedDataSet"
  path: "data/02_intermediate/pc1"
  dataset: "thesis.extras.datasets.pymeasure_data_set.PyMeasureDataSet"
  filename_suffix: '.pk'
  load_args:
    maxdepth: 2


pc2_dp:
  type: "PartitionedDataSet"
  path: "data/02_intermediate/pc2"
  dataset: "thesis.extras.datasets.pymeasure_data_set.PyMeasureDataSet"
  filename_suffix: '.pk'
  load_args:
    maxdepth: 2
does anyone has an idea of why this could be?
j
hi @Tomás Rojas, when you say you get a
FileNotFoundError
, what absolute filepath were you expecting? also, what Kedro version are you using?