Hi All, I'm new to Kedro and I'm trying to create...
# questions
u
Hi All, I'm new to Kedro and I'm trying to create an abstract dataset based on this document URL: https://docs.kedro.org/en/0.18.14/kedro.io.AbstractDataset.html. I'm getting an error when I run it. Error: DatasetError: An exception occurred when parsing config for dataset 'testAbstractCsv': Class 'kedrotest.myabstractDataSet.MyOwnDataset' not found or one of its dependencies has not been installed. In kedro ipython I was able to import it without any issues. In [1]: from pathlib import Path, PurePosixPath^M ...: import pandas as pd^M ...: from kedro.io import AbstractDataset Kedro version :- 0.18.14 in catalog.yml
Copy code
testAbstractCsv:
  type: kedrotest.myabstractDataSet.MyOwnDataset
  filepath: ${_dataset_filetype}/countries.csv
Absolute path src/kedrotest/myabstractDataSet/MyOwnDataset.py Thanks.
d
u
Hi Deepyaman, Thank you for your quick response. Yes, I am following the same instructions, but I'm still encountering the same error. Class 'kedrotest.myabstractDataSet.MyOwnDataset' not found or one of its dependencies has not been installed I am using PyCharm editor for this project and Python 3.11.5. .
d
Can you share the full stack trace? The way errors were reported here was improved in 0.19.0; either a fuller stack trace or upgrading Kedro may help make it more obvious whether you've got an issue with your dataset implementation or something else.
My initial thought was that maybe you don't have the project path configured properly (to add
src
); however, given you used
kedro ipython
, I think it should be fine.
u
Hi Deepyaman, How do you generate that full stack trace?
n
Make sure you don't forgot those
___init___.py
if it's a python module.
What did you run with
kedro ipython
? Is the error only showing up when you do
kedro run
but not
kedro ipython
?
If any case, you can always do
pip install -e .
at the root to make sure
kedrotest
is installed as a package. Kedro did something behind the background so you don't have to do this during development, but you will likely need to install it later when you need to start writing tests.
u
Hi Nok, Thanks for your reply. Yes there is
___init___.py
Yes in kedro Run its showing that error. C:. │ │ MyOwnDataset.py │ init.py <----------------------- │ └───__pycache__ MyOwnDataset.cpython-311.pyc init.cpython-311.pyc
n
What's the name of your dataset?
Copy code
testAbstractCsv:
  type: kedrotest.myabstractDataSet.MyOwnDataset
  filepath: ${_dataset_filetype}/countries.csv
Absolute path src/kedrotest/myabstractDataSet/MyOwnDataset.py the type should be the name of the class , not the python file
let's say you have
Copy code
class MyDataset(AbstractDataset)
   ...
In
MyOwnDataset.py
the type should be
xxxxx.MyOwnDataset.MyDataset
in any case, the
type
is the import path of a python object. You should be do this equivalently. let say you can import a dataset like this
from a.b.c import Dataset
Then the equivalent
type
in
catalog.yml
is
type: a.b.c.Dataset
u
Thanks a lot, Nok, that fixed my problem.
Copy code
Mydataset:
    type: kedrotest.datasets.MyOwnDataset.MyOwnDataset
    filepath: data/01_raw/countries.csv
However, I am getting new error DatasetError: An exception occurred when parsing config for dataset 'Mydataset': name 'AbstractDataset' is not defined This is myowndataset class
Copy code
from pathlib import Path, PurePosixPath
from typing import Any, Dict
import pandas as pd
# from <http://kedro.io|kedro.io> import AbstractDataset
from <http://kedro.io|kedro.io>  import AbstractDataSet
# from kedro.io.core import get_filepath_str, get_protocol_and_path

class MyOwnDataset(AbstractDataset[pd.DataFrame, pd.DataFrame]):
    def __init__(self, filepath, param1=True, param2=True):
        self._filepath = PurePosixPath(filepath)
        self._param1 = param1
        self._param2 = param2

    def _load(self) -> pd.DataFrame:
        return pd.read_csv(self._filepath)

    def _save(self, df: pd.DataFrame) -> None:
        df.to_csv(str(self._filepath))

    def _exists(self) -> bool:
        return Path(self._filepath.as_posix()).exists()

    def _describe(self):
        return dict(param1=self._param1, param2=self._param2)
n
# from kedro.io import AbstractDataset
from kedro.io import AbstractDataSet
You have commented out the import
Which version of kedro are you using? The latest one preferred the lower case
Dataset
convention. If it's possible it's better to stick with the lowercase.
u
# from kedro.io import AbstractDataset
from kedro.io import AbstractDataSet <-------
Kedro version :- 0.18.14
n
Can you remove the upper case and import the lower case instead?
Copy code
from <http://kedro.io|kedro.io> import AbstractDataset
Just this
u
Hi Nok, The solution fixed the issue. Thanks a lot for all your help, and I really appreciate your quick response.
💛 1